Biodiversity Information Standards, TDWG

GLOBAL         Sep. 26 – Oct. 1, 2010, Woods Hole, MA, USA

BIODIVERSITY
INFORMATION
FACILITY

               Data Citation Mechanism and
                    Services for primary
                       biodiversity data


               Dr Vishwas Chavan
               Senior Programme Officer for DIGIT
               vchavan@gbif.org
                                                WWW.GBIF.ORG
               Building the Biodiversity Informatics Commons
Why should I publish data?
            Recognition




            Opportunities




             Investment
Data Publishing Framework
 Cultural change towards ‘free and open
  access’ to biodiversity data

 Addresses social, technical, and policy
  concerns

 Answer ‘What is there for me?’ for ALL
Infrastructure
                                                    and Technical




                                                                            Policy
                              Legal                                          and
                                                                           Political




                                                                      Socio-
                                         Economic
                                                                     Cultural
Chavan and Ingwersen (2009) ,
BMC Bioinformatics, 10 (Suppl. 14): S2
DPF: Core Technical Components


                                           Data
                                                                       Persistent
                                         Citation
                                                                       Identifiers
                                        Mechanisms




                                                   Data Usage Index




Chavan and Ingwersen (2009) , BMC Bioinformatics, 10 (Suppl. 14): S2
Call for Data Citation
 l   1979: Dodd, S. A. (1979). Bibliographic references for numeric social science data
     files: Suggested guidelines. Journal of the American Society for Information
     Science, 30 (2): 77-82.
 l   1990: Dodd, S. A. (1990). Bibliographic references for computer files in the social
     sciences: A discussion paper. Chapel Hill, NC: Institute for Research in Social
     Science. Retrieved from http://people.virginia.edu/~pm9k/info/compRef.html
 l   2006/2007: Altman, M. & King, G. (2007). A proposed standard for the scholarly
     citation of quantitative data. D-Lib Magazine, 13 (3/4).
 l   2006: Schneider, J. (2006, Spring). Why we need a data citation standard: Lessons
     learned from compiling ICPSR’s Bibliography of Data-Related Literature. ICPSR
     Bulletin. Retrieved from
     http://www.icpsr.umich.edu/files/ICPSR/org/publications/bulletin/2006-Q1.pdf.
 l   2008: Kelly, M. C. (2008). NISO thought leader meeting on research data.
     Retrieved from http://www.niso.org/topics/tl/NISOTLDataReportDraft.pdf.
 l   2009: Green, T. (2009). We need publishing standards for datasets and data
     tables. OECD Publishing White Paper, OECD Publishing.
 l   2009: Brase, et al. (2009). Approach for a joint global registration agency for
     research data. Information Services & Use, 29 (1): 13-27. (i.e, DataCite)
Wish List for Data Citation

  Best practice guide for data citation
  Persistent identifiers to datasets
  Credit to all players from data producers to
   publishers, aggregators etc.
  All levels of granularity and combinations
  With or without annotations
  Link between traditional literature and data
  Coordinated citation support for ALL
  Research metrics for datasets
Impact of Data Citation


                Data Use       Data Citation




              Data                      Data
          Preservation                Discovery



                           Data
                         Publishing
DataONE/DataCite Example
DOI resolver and TIB
    registration


    5. URL plus id
                                            4. save full citation     EZID resolver and
                                                                     registration service
                       DataCite Member
                           (eg, CDL)
                                             3. citation +
   6. full citation                             URL + id            DataONE Coordinating
                                                                       Node metadata
                       DataONE Member                                catalog (eg, UNM or
                       Node data archive             2. metadata            UCSB)
                          (eg, Dryad)                  + URL + id
    7. full citation
                                                                       get unique id string
                                             1. data +
                                             metadata
                       Research scientist                           (opt) CDL-hosted EZID id
                                            get unique id string         minting service
Citation model
l   When using data from Dryad, please cite the original article.
     l   Sidlauskas, B. 2007. Testing for unequal rates of morphological diversification in the
         absence of a detailed phylogeny: a case study from characiform fishes. Evolution 61:
         299–316.
l   Additionally, please cite the Dryad data package. The citation should
    include the following elements:
     l   Author(s)
     l   The date on which the data was deposited
     l   The name of the data file, if applicable
     l   The title of the data package, which in Dryad is always "Data from: [Article name]"
     l   The name "Dryad Digital Repository"
     l   The data identifier
l   For example:
     l   Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological
         diversification in the absence of a detailed phylogeny: a case study from characiform
         fishes. Dryad Digital Repository. doi:10.5061/dryad.20
Data Citation Mechanism & Service

  Deep data citation mechanism
    Recognise ALL with their roles
    Multilayer citation – producer, publisher, aggregator
    Citations within citations
  Data Citation Service
    Resolve citation any time
    Discover the underlined data
Data Citation: Challenges

  Dealing with dynamic streaming data?
  Resolving to human or machine interpretable
   description of object?
  Need for registry of name spaces?
  Can metadata standards support multiple GUIDs?
  Failure to enforce data citation as mandatory
   step in Publishing cycle
Data Paper:
                              Current                                        Indian J.
Recognising                                      PhytoKeys
                                                                             Mar. Sci.
                              Biology
Data Discovery


         DoI
                                                                               Publication

                                                                               Acceptance
      GBRDS                                                      Journal       Revision
                                                                 System
                                                                               Peer Review

                             GBIF Metadata                                     Submission
                             Repository
      Registry
                                             auto conversion to manuscript

                               Distributed
               Persistent      Metadata
               Identifiers     Catalogues



     Metadata
     Authors
Data Publishing
  together with
  Scholarly Publishing!

Email: vchavan@gbif.org

TDWG_2010_Chavan_data_citation

  • 1.
    Biodiversity Information Standards,TDWG GLOBAL Sep. 26 – Oct. 1, 2010, Woods Hole, MA, USA BIODIVERSITY INFORMATION FACILITY Data Citation Mechanism and Services for primary biodiversity data Dr Vishwas Chavan Senior Programme Officer for DIGIT vchavan@gbif.org WWW.GBIF.ORG Building the Biodiversity Informatics Commons
  • 2.
    Why should Ipublish data?  Recognition  Opportunities  Investment
  • 3.
    Data Publishing Framework Cultural change towards ‘free and open access’ to biodiversity data Addresses social, technical, and policy concerns Answer ‘What is there for me?’ for ALL
  • 4.
    Infrastructure and Technical Policy Legal and Political Socio- Economic Cultural Chavan and Ingwersen (2009) , BMC Bioinformatics, 10 (Suppl. 14): S2
  • 5.
    DPF: Core TechnicalComponents Data Persistent Citation Identifiers Mechanisms Data Usage Index Chavan and Ingwersen (2009) , BMC Bioinformatics, 10 (Suppl. 14): S2
  • 6.
    Call for DataCitation l 1979: Dodd, S. A. (1979). Bibliographic references for numeric social science data files: Suggested guidelines. Journal of the American Society for Information Science, 30 (2): 77-82. l 1990: Dodd, S. A. (1990). Bibliographic references for computer files in the social sciences: A discussion paper. Chapel Hill, NC: Institute for Research in Social Science. Retrieved from http://people.virginia.edu/~pm9k/info/compRef.html l 2006/2007: Altman, M. & King, G. (2007). A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine, 13 (3/4). l 2006: Schneider, J. (2006, Spring). Why we need a data citation standard: Lessons learned from compiling ICPSR’s Bibliography of Data-Related Literature. ICPSR Bulletin. Retrieved from http://www.icpsr.umich.edu/files/ICPSR/org/publications/bulletin/2006-Q1.pdf. l 2008: Kelly, M. C. (2008). NISO thought leader meeting on research data. Retrieved from http://www.niso.org/topics/tl/NISOTLDataReportDraft.pdf. l 2009: Green, T. (2009). We need publishing standards for datasets and data tables. OECD Publishing White Paper, OECD Publishing. l 2009: Brase, et al. (2009). Approach for a joint global registration agency for research data. Information Services & Use, 29 (1): 13-27. (i.e, DataCite)
  • 7.
    Wish List forData Citation  Best practice guide for data citation  Persistent identifiers to datasets  Credit to all players from data producers to publishers, aggregators etc.  All levels of granularity and combinations  With or without annotations  Link between traditional literature and data  Coordinated citation support for ALL  Research metrics for datasets
  • 8.
    Impact of DataCitation Data Use Data Citation Data Data Preservation Discovery Data Publishing
  • 9.
    DataONE/DataCite Example DOI resolverand TIB registration 5. URL plus id 4. save full citation EZID resolver and registration service DataCite Member (eg, CDL) 3. citation + 6. full citation URL + id DataONE Coordinating Node metadata DataONE Member catalog (eg, UNM or Node data archive 2. metadata UCSB) (eg, Dryad) + URL + id 7. full citation get unique id string 1. data + metadata Research scientist (opt) CDL-hosted EZID id get unique id string minting service
  • 10.
    Citation model l When using data from Dryad, please cite the original article. l Sidlauskas, B. 2007. Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Evolution 61: 299–316. l Additionally, please cite the Dryad data package. The citation should include the following elements: l Author(s) l The date on which the data was deposited l The name of the data file, if applicable l The title of the data package, which in Dryad is always "Data from: [Article name]" l The name "Dryad Digital Repository" l The data identifier l For example: l Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20
  • 13.
    Data Citation Mechanism& Service  Deep data citation mechanism  Recognise ALL with their roles  Multilayer citation – producer, publisher, aggregator  Citations within citations  Data Citation Service  Resolve citation any time  Discover the underlined data
  • 14.
    Data Citation: Challenges  Dealing with dynamic streaming data?  Resolving to human or machine interpretable description of object?  Need for registry of name spaces?  Can metadata standards support multiple GUIDs?  Failure to enforce data citation as mandatory step in Publishing cycle
  • 15.
    Data Paper: Current Indian J. Recognising PhytoKeys Mar. Sci. Biology Data Discovery DoI Publication Acceptance GBRDS Journal Revision System Peer Review GBIF Metadata Submission Repository Registry auto conversion to manuscript Distributed Persistent Metadata Identifiers Catalogues Metadata Authors
  • 16.
    Data Publishing together with Scholarly Publishing! Email: vchavan@gbif.org