• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Poster presented at ISMB2011 in Vienna July 18
 

Poster presented at ISMB2011 in Vienna July 18

on

  • 557 views

Contributor identification is a core challenge in data publication. As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure ...

Contributor identification is a core challenge in data publication. As in scholarly communication more generally, non-unique person names and the current lack of a global identification infrastructure for producers of scholarly content makes it difficult to establish the identity of authors and other contributors. This in turn makes it difficult to accurately attribute datasets published via online digital repositories to their creators – one of several key requirements for including these important outputs in the scholarly record.

In the GEN2PHEN project (http://www.gen2phen.org) we are developing a series of novel web-based systems and processes for online dissemination of genetic variation and other research data. The core aim is that of ensuring that data creators are recognized and rewarded for publishing data. This work builds on and integrates with recently launched international initiatives to i) extend and adapt the existing DOI infrastructure for identifying, locating and citing online datasets (DataCite: http://www.datacite.org), and to ii) create a global registry of unique identifiers for authors and other contributors (ORCID: http://www.orcid.org).

The technical approach we are exploring in this pilot project utilizes this emerging global data citation and contributor identification framework, in order to allow published datasets to be discovered, cited in a scholarly context and unambiguously attributed. We argue that, along with other measures, such an incentive-based approach is key to motivating the sharing of data and other types of digital research outputs in the life sciences.

This document is published under the CC-BY license (http://creativecommons.org/licenses/by/3.0/). This means that you can copy, redistribute and adapt the content, as long as you attribute the original work.

Statistics

Views

Total Views
557
Views on SlideShare
557
Embed Views
0

Actions

Likes
1
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Poster presented at ISMB2011 in Vienna July 18 Poster presented at ISMB2011 in Vienna July 18 Document Transcript

    • Gudmundur A. Thorisson, Owen Lancaster and Anthony J. Brookes Department of Genetics, University of Leicester, Leicester, UKAs in scholarly communication more generally, non-unique person names and the current lack of a global !!!"#$%&($%")*#identification infrastructure for producers of scholarly content makes it difficult to establish the identity of Box 2: Identifying contributorsauthors and other contributors. This in turn creates challenges in attributing credit for contributions toscience, as well as in tracking use/reuse and assessing impact of research outputs. ChallengesWe are developing a series of novel web-based systems and processes for online dissemination of genetic Approx. 2/3 of ~6M authors in PubMed share a last name + first initial with at least one author. Thisvariation and other research data. The technical approach we are exploring utilizes emerging frameworks name ambiguity create difficulties in identifying and attributing creators of published works, includingfor data identification and citation (Box 1) and for contributor identification (Box 2), in order to allow datasets published via online digital repositories. Solving the contributor identification challenge is keypublished datasets to be discovered, cited in a scholarly context and unambiguously attributed. The core to including these important outputs in the scholarly record.aim is that of ensuring that data creators are recognized and rewarded for publishing their data. Weargue that, along with other measures, such an incentive-based approach is key to motivating thesharing of data and other types of digital research outputs in the life sciences. Emerging solutions With contributions from GEN2PHEN, the international ORCID initiative (http://www.orcid.org) is creating a global infrastructure to "support the creation of a permanent, clear and unambiguous recordBox 1: Identifying digital research outputs of scholarly communication". ORCID will enable identification of contributors via unique IDs and reliably linking them with their published works, including but not limited to:Challenges - Peer-reviewed publications (CrossRef DOIs)Current methods for monitoring data use/reuse and assessing impact relies on various referencingstandards and conventions. Tracking reuse is difficult, time-consuming and inaccurate, not the least - Datasets (DataCite DOIs)due to difficulties in identifying the datasets in question. - Publications in the grey literature The new infrastructure will help solve many currentExisting and emerging solutions identification-related problems and create new opportuntities, such as:Assigning digital identifiers (IDs) to published works allows them to be reliably identified and cited. Discovery:In order to fulfill the requirements of the scholarly record, IDs should be persistent, globally uniqueand citable. Together with unique IDs for contributors (Box 2), this forms the basis of unambiguous - Which other papers were published by co-authors of this paper?attribution. - Which datasets were made available by this research project? Evaluation:Digital Object Identifiers (DOIs) are widely used for identifying and citing STM publications, via the - What is the scholarly record of this job applicant?not-for-profit CrossRef publishers association (http://www.crossref.org). - How often were the paper we published cited in the last 2 years?DOIs for scientific datasets issued via DataCite (http://datacite.org) are increasingly used for scientific - What is the total no. citations and other references to papers, datasets and other outputs of thedata published in online digital repositories. project we funded? Pilot project: Cafe Variome - facilitating exchange of genetic variation data and attributing data creators Diagnostic Central End-users (e.g. laboratories ‘clearinghouse’ LSDB curators) Publish data Retrieve Atom feeds Submi&ng  muta,ons  from  diagnos,c  labs  using  “Café   Data  are  shared  with  diverse  3rd  par,es  via  manual   Variome  enabled”  so:ware  via  simple  bu>on  click retrieval  or  automated  feed-­‐based  monitoring/retrieval Unique identifier for contributor in ORCID IRISC2011 - Identity in Research Infrastructure and Scientific Communication The 2-day IRISC2011 international workshop will be held September 12-13 Data citation: G. A. Thorisson (ORCID:35-883-3523) and O. Lancaster in Helsinki, Finland. This event will bring together key stakeholders and (ORCID:35-992-3523). 4x variants in BRCA2 gene. Published online via Cafe experts and help foster collaboration, coordination and awareness in this Variome. 21 January (2011) doi:10.1255/cafevariome.BRCA2-2352354 area, not only in biomedicine and bioinformatics but in all areas of scientific G. A. Thorisson, Univ. Leicester research. gthorisson@gmail.com Agenda and other info at http://irisc-workshop.org ORCID:35-883-3523 Unique DOI name for dataset in DataCite, located at: http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354 For further information please contact gt50@le.ac.uk !"#$%&"#()*+,-.-/012."+345.6,7488+,(109) :.;.,12<368.=43>%34?3688.@<%AB$CCAD$CEFG+,-.3?36,1 6?3..8.,1$CCAHIJ ===J?.,$52.,J43?