The document discusses the use of digital identifiers to identify bioresources. It provides background on digital identifiers and their importance for tracking use and impact. It discusses use cases for identifying different types of resources like datasets, databases, and projects. Key challenges include getting authors to use appropriate identifiers and a lack of solutions for some resource types like physical samples. Next steps include recommendations for identifier use and exploring identification schemes for clinical studies and trials.
The document introduces Sean Bechhofer and provides his contact information, including that he is from the University of Manchester, his email address, Twitter handle, and blog. It then lists several publications and projects related to reproducible and open research, including myExperiment and Research Objects, with the goal of facilitating exchange and reuse of digital knowledge. Key challenges discussed are how to move beyond linear paper publications to frameworks that better support reuse of digital assets like workflows and datasets.
Co-presented for the course INLS 720: Metadata Architectures and Applications at UNC SILS. Subsequently, we also presented at the February 2013 meeting of the UNC Scholarly Communications Working Group. This presentation covered copyright in the context of metadata re-use, plus two case studies (one examining Duke University Press and the other examining open bibliographic data).
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Getting started in digital preservationSarah Jones
Digital preservation requires active management of digital information over time to ensure ongoing accessibility. It involves addressing issues like file formats becoming obsolete, storage media degradation, and a lack of descriptive information. The document provides an overview of digital preservation principles and practical initial steps organizations can take to get started, such as focusing on file formats and metadata collection, and establishing basic processes for storage, backup, and access.
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyBlue BRIDGE
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations allowing scientists residing at distant places to collaborate. They may offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction. E-Infrastructures can have different implementations (Andronico et al 2011). A major distinction is between (i) Data e-Infrastructures, i.e. digital infrastructures promoting data sharing and consumption to a community of practice (e.g. MyOcean, Blanc 2008) and (ii) Computational e-Infrastructures, which support the processes required by a community of practice using GRID and Cloud computing facilities (e.g. Candela et al. 2013). A more recent type of e-Infrastructure is the Hybrid Data Infrastructure (HDI) (Candela et al. 2010), i.e. a Data and Computational e-Infrastructure that adopts a delivery model for data management, in which computing, storage, data and software are made available as-a-Service. HDIs support, for example, data transfer, data harmonization and data processing workflows. Hybrid Data e-Infrastructures have already been used in several European and international projects (e.g. i-Marine 2011; EuBrazil OpenBio 2011) and their exploitation is growing fast supporting new projects and initiatives, e.g. Parthenos, Ariadne, Descramble.
A particular HDI, named D4Science (Candela et al. 2009), has been used by communities of practice in the fields of biodiversity conservation, geothermal energy monitoring, fisheries management, and culture heritage. This e-Infrastructure hosts models and resources by several international organizations involved in these fields. Its capabilities help scientists to access and manage data, reuse data and models, obtain results in short time and share these results with other colleagues.
The document introduces Sean Bechhofer and provides his contact information, including that he is from the University of Manchester, his email address, Twitter handle, and blog. It then lists several publications and projects related to reproducible and open research, including myExperiment and Research Objects, with the goal of facilitating exchange and reuse of digital knowledge. Key challenges discussed are how to move beyond linear paper publications to frameworks that better support reuse of digital assets like workflows and datasets.
Co-presented for the course INLS 720: Metadata Architectures and Applications at UNC SILS. Subsequently, we also presented at the February 2013 meeting of the UNC Scholarly Communications Working Group. This presentation covered copyright in the context of metadata re-use, plus two case studies (one examining Duke University Press and the other examining open bibliographic data).
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Getting started in digital preservationSarah Jones
Digital preservation requires active management of digital information over time to ensure ongoing accessibility. It involves addressing issues like file formats becoming obsolete, storage media degradation, and a lack of descriptive information. The document provides an overview of digital preservation principles and practical initial steps organizations can take to get started, such as focusing on file formats and metadata collection, and establishing basic processes for storage, backup, and access.
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyBlue BRIDGE
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations allowing scientists residing at distant places to collaborate. They may offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction. E-Infrastructures can have different implementations (Andronico et al 2011). A major distinction is between (i) Data e-Infrastructures, i.e. digital infrastructures promoting data sharing and consumption to a community of practice (e.g. MyOcean, Blanc 2008) and (ii) Computational e-Infrastructures, which support the processes required by a community of practice using GRID and Cloud computing facilities (e.g. Candela et al. 2013). A more recent type of e-Infrastructure is the Hybrid Data Infrastructure (HDI) (Candela et al. 2010), i.e. a Data and Computational e-Infrastructure that adopts a delivery model for data management, in which computing, storage, data and software are made available as-a-Service. HDIs support, for example, data transfer, data harmonization and data processing workflows. Hybrid Data e-Infrastructures have already been used in several European and international projects (e.g. i-Marine 2011; EuBrazil OpenBio 2011) and their exploitation is growing fast supporting new projects and initiatives, e.g. Parthenos, Ariadne, Descramble.
A particular HDI, named D4Science (Candela et al. 2009), has been used by communities of practice in the fields of biodiversity conservation, geothermal energy monitoring, fisheries management, and culture heritage. This e-Infrastructure hosts models and resources by several international organizations involved in these fields. Its capabilities help scientists to access and manage data, reuse data and models, obtain results in short time and share these results with other colleagues.
The document provides an overview of the EDINA & Data Library service at the University of Edinburgh. It discusses that EDINA is a JISC-funded National Data Centre that provides online resources for education and research, while the Data Library assists university users in discovering, accessing, using and managing research datasets. The Data Library offers consultancy services and has developed projects like Edinburgh DataShare, an institutional repository of research datasets, and the Research Data MANTRA online course on research data management.
The document provides an introduction to digital preservation, including definitions of key terms like preservation, digital preservation, and digital curation. It outlines some of the challenges of digital preservation, such as storage media issues, hardware and software dependence, conceptual problems dealing with digital objects, and issues of scale with large amounts of digital data. It then describes some common digital preservation strategies like technology preservation, technology emulation, information migration, and digital archaeology. The document emphasizes that digital preservation requires a life-cycle management approach.
ARCLib project presentation from Pasig 2016dp-blog-cz
Digital preservation project by a group of Czech Libraries, financed by the Ministery of Culture of Czech Republic applied research grant. First information.
Digital data preservation involves planning and allocating resources to ensure digital information remains accessible and usable over time. It requires policies, strategies and preservation methods to ensure future access. The document outlines 13 ways to approach digital preservation, including as an ongoing activity that avoids crises through continuous effort rather than periodic bursts; as a cooperative effort across organizations to enhance funding; and as a selection process to determine what data is worth preserving given limits on storage. The conclusion notes that while the article outlines preservation approaches, it does not explain how to achieve preservation or select essential data to retain.
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...Jisc
Universities and researchers need to be able to manage research data effectively to fulfil research funders requirements and ultimately to contribute to research excellence. UK universities are comparatively well advanced in what is a global challenge, but none the less there needs to be further advances in university policy, technical and support services. This session will share best practice in research data management and information about key tools that can help to develop university solutions; and it will also inform participants about the latest Jisc initiatives to help build university research data services and shared services.
Semi-automated metadata extraction in the long-termPERICLES_FP7
This presentation was delivered by Emma Tonkin (King's College London) at the Digital Preservation Coalition (DPC) event entitled 'Practical Preservation and People: a briefing about metadata', which took place at the Public Records Office of Northern Ireland, Belfast on 3 December 2015.
LIBER is a network of over 425 European research libraries that aims to promote the interests of research libraries. It plays a key role in several EU-funded projects related to digitization, open access, and digital preservation. Some of LIBER's current projects include Europeana Libraries, which will provide over 5 million digitized objects to Europeana, and APARSEN, a digital preservation network. LIBER encourages French research libraries to get involved in its activities and partner on future projects.
Slides for presentation given at the first Digital Humanities Congress held in Sheffield from 6 – 8 September 2012 with the support of the Network of Expert Centres and Centernet.
URL http://www.shef.ac.uk/hri/dhc2012
Today libraries face more and new challenges when enabling access to information. The growing amount of information in combination with new non-textual media-types demands a constant changing of grown workflows and standard definitions. Knowledge, as published through scientific literature, is the last step in a process originating from primary scientific data. These data are analysed, synthesised, interpreted, and the outcome of this process is published as a scientific article. Access to the original data as the foundation of knowledge has become an important issue throughout the world and different projects have started to find solutions.
Nevertheless science itself is international; scientists are involved in global unions and projects, they share their scientific information with colleagues all over the world, they use national as well as foreign information providers.
When facing the challenge of increasing access to research data, a possible approach should be global cooperation for data access via national representatives:
* a global cooperation, because scientists work globally, scientific data are created and accessed globally.
* with national representatives, because most scientists are embedded in their national funding structures and research organisations.
DataCite was officially launched on December 1st 2009 in London and has 12 information institutions and libraries from nine countries as members. By assigning DOI names to data sets, data becomes citable and can easily be linked to from scientific publications.
Data integration with text is an important aspect of scientific collaboration. DataCite takes global leadership for promoting the use of persistent identifiers for datasets, to satisfy the needs of scientists. Through its members, it establishs and promotes common methods, best practices, and guidance. The member organisations work independently with data centres and other holders of research data sets in their own domains. Based on the work of the German National Library of Science and Technology (TIB) as the first DOI-Registration Agency for data, DataCite has registered over 850,000 research objects with DOI names, thus starting to bridge the gap between data centers, publishers and libraries.
This presentation will introduce the work of DataCite and give examples how scientific data can be included in library catalogues and linked to from scholarly publications.
Here are the results of the dotmocracy voting:
- "Libraries are the best departments at universities to take on research data archiving." Received the most dots.
- "High cost research facilities should be obliged to share (and preserve) their data." Received the second most dots.
- "Each dataset should also include the data in its rawest form." Received the third most dots.
The top three propositions that received the most votes were:
1. Libraries are the best departments at universities to take on research data archiving.
2. High cost research facilities should be obliged to share (and preserve) their data.
3. Each dataset should also include the data in its
Presentation of current challenges of upgrading the intrasturcture for access and preservation of social science research data and worklow in Slovene social science data archive
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
1) Linked data is a set of best practices for publishing structured data on the web so that both humans and machines can access and link related data across different sources. It realizes Tim Berners-Lee's vision of a Semantic Web.
2) The key principles of linked data are using URIs to identify things, providing HTTP URIs so that URIs can be looked up, and including links to other URIs to allow for discovery of related data on the web.
3) By following these principles, data sources on the web have been connected into a large Web of Data, with over 31 billion RDF triples organized into different domains such as media, geography, life sciences, and libraries. This enables new applications for data
The document discusses requirements for data management plans from the National Science Foundation. It notes that as of January 2011, NSF will require a data management plan for all new grant proposals as well as existing grants. The plan must address what data will be collected and how it will be organized, preserved, shared, and accessed. It emphasizes the importance of effective data management for facilitating research by both the principal investigators and other researchers. The document provides guidance on developing a data management plan that meets NSF's criteria and effectively manages research data.
The document discusses Gudmundur Thorisson's work with ORCID and JISC MRD projects. ORCID is working to create a global registry of researcher identifiers to help disambiguate author names and attribute contributions. This will help link researchers to their work more accurately. The registry will be open, free for researchers to use, and follow open principles. JISC MRD projects could benefit from ORCID's efforts to better attribute researchers and incentivize data sharing.
TNC2012 Federated and scholarly identity - match made in heaven?Gudmundur Thorisson
This document discusses federated identity and scholarly identity. It provides an overview of scholarly identity and challenges related to name ambiguity and fragmented online identities. It then describes the Open Researcher & Contributor ID (ORCID) initiative, which aims to provide unique identifiers for researchers and link them unambiguously to their works. ORCID currently has over 300 participating organizations and is working to support the creation of a clear record of scholarly contributions through unique identifiers. Examples of how ORCID could enable knowledge discovery by linking contributors to their works are also provided.
The document provides an overview of the EDINA & Data Library service at the University of Edinburgh. It discusses that EDINA is a JISC-funded National Data Centre that provides online resources for education and research, while the Data Library assists university users in discovering, accessing, using and managing research datasets. The Data Library offers consultancy services and has developed projects like Edinburgh DataShare, an institutional repository of research datasets, and the Research Data MANTRA online course on research data management.
The document provides an introduction to digital preservation, including definitions of key terms like preservation, digital preservation, and digital curation. It outlines some of the challenges of digital preservation, such as storage media issues, hardware and software dependence, conceptual problems dealing with digital objects, and issues of scale with large amounts of digital data. It then describes some common digital preservation strategies like technology preservation, technology emulation, information migration, and digital archaeology. The document emphasizes that digital preservation requires a life-cycle management approach.
ARCLib project presentation from Pasig 2016dp-blog-cz
Digital preservation project by a group of Czech Libraries, financed by the Ministery of Culture of Czech Republic applied research grant. First information.
Digital data preservation involves planning and allocating resources to ensure digital information remains accessible and usable over time. It requires policies, strategies and preservation methods to ensure future access. The document outlines 13 ways to approach digital preservation, including as an ongoing activity that avoids crises through continuous effort rather than periodic bursts; as a cooperative effort across organizations to enhance funding; and as a selection process to determine what data is worth preserving given limits on storage. The conclusion notes that while the article outlines preservation approaches, it does not explain how to achieve preservation or select essential data to retain.
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...Jisc
Universities and researchers need to be able to manage research data effectively to fulfil research funders requirements and ultimately to contribute to research excellence. UK universities are comparatively well advanced in what is a global challenge, but none the less there needs to be further advances in university policy, technical and support services. This session will share best practice in research data management and information about key tools that can help to develop university solutions; and it will also inform participants about the latest Jisc initiatives to help build university research data services and shared services.
Semi-automated metadata extraction in the long-termPERICLES_FP7
This presentation was delivered by Emma Tonkin (King's College London) at the Digital Preservation Coalition (DPC) event entitled 'Practical Preservation and People: a briefing about metadata', which took place at the Public Records Office of Northern Ireland, Belfast on 3 December 2015.
LIBER is a network of over 425 European research libraries that aims to promote the interests of research libraries. It plays a key role in several EU-funded projects related to digitization, open access, and digital preservation. Some of LIBER's current projects include Europeana Libraries, which will provide over 5 million digitized objects to Europeana, and APARSEN, a digital preservation network. LIBER encourages French research libraries to get involved in its activities and partner on future projects.
Slides for presentation given at the first Digital Humanities Congress held in Sheffield from 6 – 8 September 2012 with the support of the Network of Expert Centres and Centernet.
URL http://www.shef.ac.uk/hri/dhc2012
Today libraries face more and new challenges when enabling access to information. The growing amount of information in combination with new non-textual media-types demands a constant changing of grown workflows and standard definitions. Knowledge, as published through scientific literature, is the last step in a process originating from primary scientific data. These data are analysed, synthesised, interpreted, and the outcome of this process is published as a scientific article. Access to the original data as the foundation of knowledge has become an important issue throughout the world and different projects have started to find solutions.
Nevertheless science itself is international; scientists are involved in global unions and projects, they share their scientific information with colleagues all over the world, they use national as well as foreign information providers.
When facing the challenge of increasing access to research data, a possible approach should be global cooperation for data access via national representatives:
* a global cooperation, because scientists work globally, scientific data are created and accessed globally.
* with national representatives, because most scientists are embedded in their national funding structures and research organisations.
DataCite was officially launched on December 1st 2009 in London and has 12 information institutions and libraries from nine countries as members. By assigning DOI names to data sets, data becomes citable and can easily be linked to from scientific publications.
Data integration with text is an important aspect of scientific collaboration. DataCite takes global leadership for promoting the use of persistent identifiers for datasets, to satisfy the needs of scientists. Through its members, it establishs and promotes common methods, best practices, and guidance. The member organisations work independently with data centres and other holders of research data sets in their own domains. Based on the work of the German National Library of Science and Technology (TIB) as the first DOI-Registration Agency for data, DataCite has registered over 850,000 research objects with DOI names, thus starting to bridge the gap between data centers, publishers and libraries.
This presentation will introduce the work of DataCite and give examples how scientific data can be included in library catalogues and linked to from scholarly publications.
Here are the results of the dotmocracy voting:
- "Libraries are the best departments at universities to take on research data archiving." Received the most dots.
- "High cost research facilities should be obliged to share (and preserve) their data." Received the second most dots.
- "Each dataset should also include the data in its rawest form." Received the third most dots.
The top three propositions that received the most votes were:
1. Libraries are the best departments at universities to take on research data archiving.
2. High cost research facilities should be obliged to share (and preserve) their data.
3. Each dataset should also include the data in its
Presentation of current challenges of upgrading the intrasturcture for access and preservation of social science research data and worklow in Slovene social science data archive
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
1) Linked data is a set of best practices for publishing structured data on the web so that both humans and machines can access and link related data across different sources. It realizes Tim Berners-Lee's vision of a Semantic Web.
2) The key principles of linked data are using URIs to identify things, providing HTTP URIs so that URIs can be looked up, and including links to other URIs to allow for discovery of related data on the web.
3) By following these principles, data sources on the web have been connected into a large Web of Data, with over 31 billion RDF triples organized into different domains such as media, geography, life sciences, and libraries. This enables new applications for data
The document discusses requirements for data management plans from the National Science Foundation. It notes that as of January 2011, NSF will require a data management plan for all new grant proposals as well as existing grants. The plan must address what data will be collected and how it will be organized, preserved, shared, and accessed. It emphasizes the importance of effective data management for facilitating research by both the principal investigators and other researchers. The document provides guidance on developing a data management plan that meets NSF's criteria and effectively manages research data.
The document discusses Gudmundur Thorisson's work with ORCID and JISC MRD projects. ORCID is working to create a global registry of researcher identifiers to help disambiguate author names and attribute contributions. This will help link researchers to their work more accurately. The registry will be open, free for researchers to use, and follow open principles. JISC MRD projects could benefit from ORCID's efforts to better attribute researchers and incentivize data sharing.
TNC2012 Federated and scholarly identity - match made in heaven?Gudmundur Thorisson
This document discusses federated identity and scholarly identity. It provides an overview of scholarly identity and challenges related to name ambiguity and fragmented online identities. It then describes the Open Researcher & Contributor ID (ORCID) initiative, which aims to provide unique identifiers for researchers and link them unambiguously to their works. ORCID currently has over 300 participating organizations and is working to support the creation of a clear record of scholarly contributions through unique identifiers. Examples of how ORCID could enable knowledge discovery by linking contributors to their works are also provided.
We took a journey to visit AnnaMaria. The trip involved traveling by car for several hours to reach our destination. We arrived safely and were happy to see AnnaMaria.
This document proposes collaborating with the BioDBCore initiative to standardize the registration and description of biological databases. It identifies challenges in uniquely identifying databases due to unstable URLs. The proposal suggests adopting the MIRIAM registry's persistent identifiers to decouple identification from location. Benefits include globally identifying life science databases, improved discovery of relevant resources, and potential for BioDBCore to evolve into a database publishing platform. Open questions remain regarding technical details and integrating existing database lists.
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developmentsGudmundur Thorisson
This document summarizes updates on identity initiatives including ORCID and contributions tracking tools for Drupal websites. ORCID is developing an API to allow unique identification of scholarly authors and tracking of author-publication links. An IRISC workshop discussed challenges around unambiguous author identification and opportunities for ORCID and identity federations to collaborate. The document also describes plans to develop a Drupal module to enhance tracking of content contributions and link local user accounts to ORCID profiles.
This document describes a demonstration project linking ORCID identifiers and DataCite identifiers called ODIN. The project aims to connect researchers and datasets via persistent identifiers. It is a 2 year EC-funded project with 7 partners. A proof-of-concept tool was developed that allows researchers to claim datasets in their ORCID profile by searching and linking from CrossRef and DataCite metadata. The tool demonstrates prospective linking of ORCIDs in data workflows as well as retrospective claiming of published datasets.
BioMed Central is a large open access publisher that is committed to open data initiatives. They have implemented several solutions to promote open data practices, including data journals, an open data award, and enabling data citation. They also work to integrate data hosting and deposition, address data licensing issues, and provide guidance on best practices. Future goals include adding more value to text and data mining applications and building business models around open data.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
INNOVATION AND RESEARCH (Digital Library Information Access)Libcorpio
Innovation and research, Digital Library Information Access, LIS Education, Library and Information Science, LIS Studies, Information Management, Education and Learning, Library science, Information science, Digital Libraries, Research on Digital Libraries, DL, Innovation in libraries and publishing, Areas of Research for DL, Information Discovery, Collection Management and Preservation, Interoperability, Economic, Social and Legal Issues, Core Topics In Digital Libraries, DL Research Around The World
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
The document discusses creating a permanent online and offline WASH information repository to address common problems like broken links, removed documents, and servers going down. It proposes using a repository with persistent identifiers (DOIs) to prove added value through improved search efficiency, avoiding duplicating existing information, and user services. Examples of existing repositories are provided. Options discussed include submitting information to existing institutional repositories or creating a dedicated shared WASH repository. The benefits of using Digital Object Identifiers (DOIs) to provide actionable, interoperable links are also covered.
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 – March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
Persistent Identifiers (PiDs) for research – why we have them, why there are so many PiD systems, how they work looking at a few examples (Handles, DOIs, ORCIDs), how to choose one, can PiD systems fail and what’s happening in the international PiD community
Panel presentation given at: Policy and Technology for e-Science, ESOF (Euroscience Open Forum) Satellite Event, Institut d\'Estudis Catalans, Barcelona, Spain, 16-17 July 2008
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Creating a sustainable business model for a digital repository: the Dryad exp...ASIS&T
Creating a sustainable business model for a digital repository: the Dryad experience
Peggy Schaeffer
Datadryad.org
Presentation at Research Data Access & Preservation Summit
22 March 2012
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
- Persistent identifiers (PIDs) play a key role in discoverability, accessibility, and reproducibility of research by providing long-lasting references to digital resources like publications, data, software, and people.
- There are many PID systems that vary in purpose, governance, metadata collected, and other factors such as Handles, DOIs, and ORCIDs. DOIs are most widely used for research data.
- When choosing a PID, factors to consider include purpose, scope, underlying technology, governance, and trustworthiness to ensure the PID remains long-lasting. It is important that PID systems and their social infrastructure are maintained to avoid failures.
General introduction to Open Data Policies H2020, influence of OD policies on...Nancy Pontika
This document provides an overview of open data policies in Horizon 2020 (H2020) research projects. It discusses how H2020 mandates open access to peer-reviewed publications and research data generated by projects. Projects participating in the H2020 Open Research Data Pilot are required to make their data publicly available by depositing it in an open research data repository. Exceptions can be made if openly sharing the data would jeopardize commercialization, privacy, or the project's main goals. The document also outlines licensing options, metadata standards, and resources like Zenodo that can help researchers comply with H2020 open data requirements.
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
OpenAIRE Interoperability Workshop (8 Feb. 2013).
DataCite – Bridging the gap and helping to find, access and reuse data – Herbert Gruttemeier, INIST-CNRS
Similar to BRIF workshop Toulouse 2012 Digital IDs subgroup (20)
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersGudmundur Thorisson
This document summarizes a presentation about connecting identifiers like ORCID and DOIs to link researchers and their works. It describes prototypes created by the ODIN project, including a DataCite2ORCID tool that allows users to search DataCite metadata, find their works, and add them to their ORCID profile with a click. The presentation discusses challenges in linking heterogeneous metadata and next steps to capture contributor-work relationships and align with community standards.
The document discusses the Open Researcher & Contributor ID (ORCID) initiative. ORCID aims to solve the problem of ambiguous author attribution in scholarly works by assigning unique identifiers to individual researchers. It outlines how ambiguous names and the increasing number of authors per work have broken the current scholarly attribution system. ORCID launched in 2009 with support from research institutions, publishers, and organizations to create a central registry of researcher profiles linked to contributions. The document promotes the benefits of ORCID for reliable author identification and attribution across the scholarly community.
BRIF workshop Toulouse 2012 ORCID intro and status updateGudmundur Thorisson
This document discusses ORCID (Open Researcher and Contributor ID), an organization that aims to solve the problem of name ambiguity in scholarly research by assigning unique identifiers to individual researchers. ORCID has recently launched a live service where researchers can register for a free ORCID iD and begin managing their profile and research contributions. The document outlines several ways ORCID identifiers could be integrated by research institutions, publishers, and other organizations to streamline author attribution and research management processes.
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?Gudmundur Thorisson
This document summarizes the launch and status of the ORCID system for uniquely identifying academic authors. It notes that the ORCID service is now live but still has some bugs and missing features. It encourages researchers to register for an ORCID identifier and integrators like publishers and organizations to begin using the public and members APIs to integrate ORCID into their systems. Finally, it discusses challenges around encouraging broader adoption, including by smaller organizations, and efforts with the ORCID and DataCite Interoperability Network project.
This document discusses open access to scientific research data. It notes that scientific research is increasingly data-driven and large-scale, especially in fields like high-energy physics, astronomy, and biology. However, inadequate access to research data is a problem, limiting opportunities to reuse data and validate or build upon past findings. The document examines some incentive-based approaches and key developments related to improving data sharing. It provides examples of large-scale data generation projects and challenges around managing and analyzing big data. Overall, the document argues that unrestricted sharing of scientific data deposited in the public domain could accelerate research and advance knowledge.
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...Gudmundur Thorisson
A major challenge facing VIVO is the retrieval of published works associated with specific authors from participating institutions, and automated disambiguation & identification of authors and scholarly works. VIVO thus shares many of the same goals as the Open Researcher and Contributor ID not-for-profit organization (ORCID: http://www.orcid.org). ORCID is working to solve the long-standing name ambiguity problem in scholarly communication globally, not only for researchers affiliated with academic institutions, but for contributors to scholarly works of all kinds. The aim of this mini-grant collaborative project is to explore how VIVO and ORCID could interact in the scholarly identity ecosystem, by way of small-scale implementation work and technology evaluation&review. The presentation will provide a brief introduction to ORCID and a background to the project, summarize the technical development undertaken thus far and outline the work remaining, and discuss some possilities for future work beyond this specific short-term project.
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...Gudmundur Thorisson
The document discusses Gudmundur Thorisson's involvement with ORCID and related projects. It describes ongoing and planned genetic research data publication projects that incorporate ORCID to help address challenges around name ambiguity and attribution. Specifically, it outlines projects using ORCID to provide publication credit and unique identifiers for data deposits in Cafe Variome and nanopublications in GWAS Central. It also discusses how ORCID could help aggregate a digital scholar's various online identities and contributions across publications, data, code, and other research objects.
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...Gudmundur Thorisson
The document discusses integrating ORCID researcher identifiers with data publication to provide incentives for data sharing. It describes two of the author's data publication projects: a disease genetics data project and a project called Cafe Variome that facilitates the exchange of genetic data between diagnostic laboratories and databases. The author argues that treating data as publications that are cited and attributed to their creators, such as through assigning DOIs and linking to ORCID IDs, can help address challenges around data sharing by incentivizing researchers.
sameAs London May 2011: The digital scholar, identity on the Web and ORCIDGudmundur Thorisson
The document discusses the challenges of identity fragmentation for digital scholars and how ORCID aims to address this issue. ORCID seeks to provide a single global registry of researcher identifiers that can be used to attribute contributions across publications, datasets, software, and other research outputs. This would help address problems like a lack of incentives for data sharing by allowing all contributions to be properly attributed and credited. The document outlines several potential use cases for how ORCID could aggregate different aspects of a researcher's identity and online presence.
The document discusses two initiatives - Cafe RouGE and ORCID - for improving data sharing and attribution for genetic research data. Cafe RouGE is a central clearinghouse that assigns DOIs to genetic variation data submitted by diagnostic laboratories to facilitate sharing and tracking data usage. ORCID seeks to address challenges in attributing work to contributors by providing a global registry of disambiguated IDs for researchers. The initiatives aim to improve data publication, citation and credit for data submitters.
G.A. Thorisson presents on the collaborative project between VIVO and ORCID to address challenges in author identification and attribution. The document discusses problems with name ambiguity and the need for unique researcher identifiers. ORCID aims to assign persistent identifiers to individual researchers to disambiguate names and track author contributions. The collaborative project between VIVO and ORCID involves evaluating how the two systems can interact technically by identifying overlaps in capabilities, reusing software components, and developing extensions to better integrate researcher profiles and publication data.
Identity in research data publication - meeting with SageCite people march2011Gudmundur Thorisson
The document discusses the problem of non-unique author names in scholarly literature. Approximately two-thirds of the 6 million authors in MEDLINE have names that are ambiguous. It introduces ORCID as a solution to provide unique identifiers for authors and contributors to automatically disambiguate names and accurately attribute publications. ORCID assigns persistent digital identifiers to individuals and links author names to research works, facilitating credit and recognition of contributions.
The document discusses incentivizing data sharing by treating data like publications. It proposes a system where researchers can publish datasets online, receive digital object identifiers (DOIs) for datasets, and have their ORCID researcher identifiers linked to the DOIs. This would allow researchers to be unambiguously attributed to the datasets they generate and provide metrics like the number of times their datasets are cited, incentivizing data sharing similar to how the current publication system works.
1. BRIF Digital identifiers subgroup
Gudmundur A. Thorisson <gt50@leicester.ac.uk> GEN2PHEN / University of Leicester
Pierre-Antoine Gourraud <pierreantoine.gourraud@ucsf.edu> UCSF
-- Overview --
‣Brief backgrounder on identification & digital identifiers
‣Use cases for bio-resource identification in BRIF
‣Digital resources: datasets, databases (Mummi)
‣Non-digital resources: projects, studies, cohorts [...] (Pierre)
‣Conclusions and next steps
This work is published under the Creative Commons Attribution license
(CC BY: http://creativecommons.org/licenses/by/3.0/) which means that
it can be freely copied, redistributed and adapted, as long as proper
attribution is given.
Monday, 22 October 12
2. BRIF and bio-resource identification
• The identification requirement: need to identify resources in
order to
– track use/reuse and impact
– credit those who contribute to them
• Biobanking projects have relied on:
– Project/study/cohort names
• Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr
• Challenges: - ad hoc agreements with research groups who reuse samples or data
- painstaking manual searching through literature for mentions of ‘GAZEL‘
- project names are often ambiguous in global context
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
4. BRIF and bio-resource identification
• The identification requirement: need to identify resources in
order to
– track use/reuse and impact
– credit those who contribute to them
• Example: biobanking projects frequently rely on...
– Project/study/cohort names
• Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr
• Challenges: - ad hoc agreements with research groups who reuse samples or data
- painstaking manual searching through literature for mentions of ‘GAZEL‘
- project names are often ambiguous in global context
– Citations to journal publications
• Which paper to cite? Tricky to keep track of which citations are relevant to impact
• Also troublesome if there is no paper to cite (e.g. for a new study)
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
5. Digital identifiers - some background
• Definition: a digital identifier is a character string used to uniquely
identify i) a digital object in a computer system, or ii) a record in a
computer system which describes a non-digital object
• Persistence - once assigned, identifier MUST NOT change
• Uniqueness - global scope vs local scope
– Most ID schemes require tacid knowledge of the type of identifier to interpret
• Example: EC grant identifiers in acknowledgement statements
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
6. This work has received funding from the European Community's
Seventh Framework Programme (FP7/2007-2013) under grant
agreement number 200754 - the GEN2PHEN project.
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
7. This work has received funding
under grant
agreement number 200754
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
8. Digital identifiers - some background
• Definition: a digital identifier is a character string used to uniquely
identify i) a digital object in a computer system, or ii) a record in a
computer system which describes a non-digital object
• Persistence - once assigned, identifier MUST NOT change
• Uniqueness - global scope vs local scope
– Most ID schemes require tacid knowledge of the type of identifier to interpret
• Example: EC grant identifiers
• Some problem domains require for globally unique IDs
– Example: ISBN numbers to identify books, e.g. for copyright purposes
• Some problem domains require resolvable IDs
– Resolve = retrieve out information about the thing being identified, including where
to access it (for a digital object, its location on the Internet)
– Digital Object IDs best known, but several other systems exist
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
10. Identifier use cases in BRIF
• 3x broad categories of “stuff” to identify
i) Digital resources
Resources that actually “lives” in computers (born-digital or digitized content):
datasets and databases
ii) Physical resources
Resources corresponding to actual physical things: samples, groups of samples,
experimental instruments, etc.
iii) Project-level and other “meta” resources
Higher-level aggregates of things, projects, organizations, consortia etc.
NB in many cases identifiers already exist for these things, but they are
not exposed to the outside world in a usable form (i.e. made resolvable,
citable, globally-unique).
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
11. Datasets
• Definition: a data set (or dataset) is a collection of data, often presented in
tabular form but in the bio-sciences also frequently in a multitude of
domain-specific formats, such as FASTA for biological sequences
• Data publication and data citation is a hot topic - lots of
research and infrastructure-building activity in recent years
• Emerging best practices for data citation & attribution
• Identifiers for dataset - persistent data DOIs issued via DataCite
• Little new for BRIF to add here, except issue recommendations
– KEY POINT: infrastructure for data preservation and access is a prerequisite for any
sort of persistent bio-dataset identification scheme. Many projects don’t have this!
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
12. Data DOI scenario (simplified)
1. Research group registers a dataset and metadata in a suitable domain
repository (or their own repository)
2. Repository archives dataset and and assigns a DOI name to it
3. Unique DOI name is used by article authors (and others) to indicate resource
reuse (ideally via formal data citation)
4. Journal article reference listings & full-text and other sources are mined to
identify references to dataset and/or downloads
5. Dataset-level metrics calculated from collected data
e.g. - total no. citations in scholarly articles
- no. secondary citations (citations to papers which cited the original dataset)
- no. downloads in the last 2 years
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
13. ORCID and DataCite Interoperability Network
• Persistent identifiers for connecting people and
dataset
• 2y EC-funded project, 7 partners in Europe + USA
• Two main proof-of-concept pilots
– Social Science data - use and citation of British Birth Cohort
Studies
• historical data, decades old, steadily being curated by lots of
different people
• high rate of reuse, often cited in papers
– High-energy physics - attribution challenges
• dealing with large no. authors on HEP papers - ‘dilution’ of the term
authorship
• Linking HEP papers to supporting datasets
http://odin-project.eu/
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
14. Databases
• Definition: an online database can be regarded as a collection of
data, but made accessible in such a way that facilitates using the data
to answer scientific question, via structured querying and/or free-text
searching of the data over the Internet
• Broad range, from large-scale DNA and protein sequence
repositories to small locus-specific databaess
– E.g. GenBank, UniProt, GWAS Central, Ehlers-Danlos Syndrome Variant Database
• Challenges in assessing impact & attributing curators
– Reliance citations to database paper, if there is one (sometimes many)
• Analyzing website traffic is another indicator - highly-accessed database =~ important
– Database URLs sometimes change
– Database name + URL often only mentioned only in materials&methods, no citation
– Credit via authorship impossible if there is no database journal paper
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
15. BioDBCore - global catalogue of bio-db’s
• BioDBCore aims
– annotation - organize the bio-database
‘resourceome’
– discovery - e.g. which protein sequence
databases are available?
• Who’s behind it?
– International Society for Biocuration
– Resource catalogues: Bioinformatics Links,
BioSiteMaps, NAR db-issue etc
– Working group includes reps from NAR and
DATABASE journals, MIBBI, Model
organism db’s, others
• Catalogue will have persistent
identifiers for each db entry
http://www.biosharing.org/biodbcore
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
17. •[slot in Pierre]
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
18. From
Pa(ents
to
BioBanks
and
back…
• Persistent
IDs
for
datasets
&
other
digital
resources
– Absolute
need
• From
BioresourceResearchIF
to
BioresourceXIF
– More
than
an
IP
address
?
• Increase
need
of
iden<fica<on
for
source
of
informa<on
in
general
–
Not
only
research
purpose…
– “Big
data”
– Quan<fied
self.
• Blurring
the
border
between
:
Research,
data
(Non-‐CLIA),
Clinically
approved
,
consumer
centered
data
Monday, 22 October 12
20. Conclusions / next steps
• Complex landscape, lots of problems to tackle
• Key challenge will be to get authors to use the right identifiers
– education, awareness, best practices, journal guidelines etc.
– build support into tools that researchers use
• Potential outputs from BRIF subgroup, by end of GEN2PHEN
– Continue work on whitepaper on identifiers (partial drafted earlier in the year)
– Compile recommendations for authors & biobankers, for use cases where workable
solutions exist or are emerging (data DOIs, BioDBCore)
• Need some biobanker-expert help in ID subgroup!
– Esp. to look in-depth into study catalogues with established identifier schemes
• International Clinical Trials Registry Platform
• ClinicalTrials.gov
• P3G study catalogue
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
21. Acknowledgements
GEN2PHEN Consortium
This work has received funding from the
http://www.gen2phen.org/about-gen2phen/partners European Community's Seventh
Framework Programme (FP7/2007-2013)
under grant agreement number 200754 -
Prof Anthony J. Brookes Bioinformatics Group, Leicester
the GEN2PHEN project.
Contact me!
<gt50@le.ac.uk> |<gthorisson@gmail.com>
http://www.linkedin.com/in/mummi
http://www.twitter.com/gthorisson
Published under the CC BY license (http://
http://www.gthorisson.name creativecommons.org/licenses/by/3.0/)
BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12