NISO DataCite EZID Presentation

  • 2,274 views
Uploaded on

Presentation at August 11 "Show Me the Data" NISO Webinar on

Presentation at August 11 "Show Me the Data" NISO Webinar on

More in: Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,274
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
12
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Let me begin by introducing the overall landscape for data management. Range of content: Increasing number, size, and diversity of content, and Range of producers and consumers: Faculty, researchers, libraries, etc. Disruptive changes: technology, user expectation, institutional mission, resources The new landscape demands a range of partners and solutions You can look at this as a problem or as an unparalleled Opportunity.
  • Digital curation is not just about technology; ->I t is a set of policies and practices focused on maintaining and adding value to digital content for use now and into the indefinite future ->It can be applied to the humanities, social sciences, and sciences ->encompasses preservation and access --preservation ensures access *over time* --access depends upon preservation *up to a point in time*
  • University of California Curation Center is the provider of digital curation services centered at CDL. Providing high quality and cost-effective digital curation services Developing hosted and locally deployable services Creating and foster ing partnerships that bring together the expertise and resources of the University of California. Showcasing campus initiatives Supporting a community of experts, researchers and stakeholders
  • Let‘s zero in on the problem statement for datasets in particular. Here we see the first step in many research processes –the collection, analysis, synthesis and interpretation of DATA.
  • AND NOW that data has become Information and is published...
  • And now that it is published , the knowledge is accessible
  • ...and the publication is traceable...
  • But the DATA IS LOST!
  • IN OTHER WORDS, there is a gap between published research and underlying data ->Published work is held by libraries ->Datasets are held by data archiving centers ->There is No effective way to link between datasets and articles ->There is No widely used method to cite or identify datasets ->There is No easy way to share or to get credit for data creation
  • ->Published work held by libraries ->Datasets held by data archiving centers ->No effective way to link between datasets and articles ->No widely used method to cite or identify datasets ->No easy way to share or to get credit for data creation
  • This chart summarizes the differences between the two worlds, if you will.
  • We are left with a choice, dramatized here. Can we really afford to lose some of the data that is at risk?
  • To address this challenge, DataCite was formed in 2009 by 10 Libraries and Research Centers
  • Publishers and Data Centers had to establish one-to-one relationships.
  • Publishers and Data Centers had to establish one-to-one relationships.
  • DataCite provides a connection, a hub. And now, let me turn things over to John Kunze, who will explain just how that happens.
  • TRANSITION TO JOHN
  • This slide animates, with a few options, a vision of how DataONE would work with DataCite. It all begins and ends with the research scientist or data producer shown here at the bottom. A scientist may nurture a dataset for months or years before it gets to an archive, during which time it will need an identifier, therefore [CLICK] their institution may encourage the option of obtaining a “preservation-ready” identifier very early in that period. An identifier string that they obtain from CDL’s EZID (easy-eye-dee) service will be opaque, unique, and well-suited for embedding inside a DOI, ARK, or other identifier when the time comes to deposit a copy of the data in an archive. There’s also a substantial convenience depositor and users in not having to rename the dataset upon deposit. [CLICK] When it comes time to deposit, the scientist will upload the dataset and descriptive metadata to a DataONE Member Node, which corresponds to a DataCite regional data archive. The receiving data archive will have its own policy with regard to identifier assignment. Some archives, such as Dryad, will create and assign identifier strings reflecting institutional naming requirements. Others will exercise the option of using a preservation-ready id that arrives with the dataset, if any, or [CLICK] optionally obtaining and assigning an identifier from EZID. [CLICK] The DataONE Member Node then contacts a DataONE Coordinating Node to signal the presence of a new or updated dataset so that the DataONE metadata catalog can be updated. [CLICK] Then the Member Node will contact DataCite member, CDL, to request registration of a dataset citation. CDL in turn will create two registrations. [CLICK] The first is with its own EZID resolver service, which keeps a redundant record, supports ids from any identifier scheme and provides a “shadow resolution” service for DOIs; the shadow resolution will be available to those who want to publish URLs containing DOIs with extended path parts in order to reference versioned dataset components, but do so without the cost of registering a DOI for every component and with greater resolver function than the DOI/Handle infrastructure currently supports. The second registration [CLICK] that CDL creates is with the DOI infrastructure via the interface running at the TIB (Technical Information Bibliotek). Armed with what by now is a completely fleshed out and ready-to-go standard DataCite citation, [CLICK] CL returns that citation to the DataONE member node. [CLICK] Finally, the member node communicates the official citation back to the data producer, who can now update their bibliographies accordingly, notify colleagues, etc.
  • -> by enabling them to find, cite, and get credit for research datasets with confidence ->by providing workflows and standards for data publication ->as they extend their historic collection-building activities to datasets, allowing them to preserve their institution’s research investments ->to enrich their publications with the full story
  • API targeted rollout to an early adopter V 2.0 UI rollout to our University of California partners first.

Transcript

  • 1. Persistent Citation & Identification for Datasets: DataCite and EZID John Kunze, Associate Director, UC3 Joan Starr, Manager, Strategic & Project Planning, CDL
  • 2. Problem statement: the rocky landscape
    • “ My grant requires a data sustainability plan”
    • “ I know I should be doing something more to protect my stuff, but I don’t know what”
    • “ I don’t want to preserve my stuff, just store it forever”
    University of California Curation Center, California Digital Library
  • 3. Digital curation provides the answer. University of California Curation Center, California Digital Library
  • 4. UC3 at CDL Web Archiving Service Chronopolis Media Vault Program Curation microservices
  • 5. Problem: the research trajectory collected analysed synthesised interpreted are Publication Data
  • 6. Problem: the research trajectory collected analysed synthesised interpreted are becomes Information is published Publication Data
  • 7. Problem: the research trajectory collected analysed synthesised interpreted are becomes Information is published becomes Knowledge Publication … is accessible Data
  • 8. Problem: the research trajectory collected analysed synthesised interpreted are becomes Information is published becomes Knowledge Publication … is accessible … is traceable Data
  • 9. Problem: the research trajectory collected analysed synthesised interpreted are becomes Information is published becomes Knowledge Publication … is accessible … is traceable … is lost! Data
  • 10. In other words: a gap...
    • between published research and underlying data
  • 11. A gap...
    • between published research and underlying data
    • As a result, datasets are
      • Difficult to discover
      • Difficult to access
      • Difficult to archive
      • Second-class citizens in the scholarly record
  • 12. Second-class citizens in the scholarly record.
    • Research data
    • Journal article
    Data is difficult to manage after project funding ceases Who has it? How do I get it? What is it’s impact? Where is it? Libraries keep it safe. Many libraries and archives have it . Many libraries and archives have it and will share it. I can monitor its impact. I know how to find it.
  • 13. A choice
    • If the scientific record is at risk
      • Results can’t be reproduced
      • Science fails, global catastrophe ensues
    • Better data publishing, sharing, and archiving
    • OR
    • Planetary destruction?
    Roberto Rizzato
  • 14. DataCite members
    • Technische Informationsbibliothek (TIB), Germany
    • Australian National Data Service (ANDS)
    • The British Library
    • California Digital Library, USA
    • Canada Institute for Scientific and Technical Information (CISTI)
    • L’Institut de l’Information Scientifique et Technique (INIST), France
    • Library or the ETH Zürich
    • Library of TU Delft,
    • The Netherlands
    • Purdue University, USA
    • Technical Information Center of Denmark
  • 15. Before DataCite... Publishers Data centres
  • 16. Before DataCite… Publishers Data centres
  • 17. With DataCite… Publishers Data centres
  • 18. DataCite structure Carries International DOI Foundation DataCite Member Institution Member Institution . . . Works with Managing Agent (TIB) Associate Stakeholder, e.g., Library Data Researcher or Producer Data Researcher or Producer Member Data Centre Data Centre Data Center, Library, Publisher Data Centre Data Centre Data Center, Library, Publisher
  • 19. DataCite example CDL DataONE Member Node data archive (eg, Dryad ) Research scientist 6. full citation 7. full citation
    • data +
    • metadata
    3. citation + URL + id DOI resolver and TIB registration 5. URL plus id EZID resolver and registration service 4. save full citation (opt) CDL-hosted EZID id minting service DataONE Coordinating Node metadata catalog (eg, UNM or UCSB) get unique id string get unique id string 2. metadata + URL + id
  • 20. EZID
    • One stop shop for DataCite DOIs & more
    • California Digital Library is a trusted service provider
    • EZID creates ids, stores metadata and resolver target URLs.
    • EZID supports DataCite DOIs and lower-cost ids (ARKs, URLs)
  • 21. How it could look: eScholarship and Datacite Supplementary Data Reichl, R., Waldinger, R., et al. (2006) Table A: Survey of Attitudes and… Table B: Latinos in LA Basin…
  • 22. Linking data to article
      • Dataset
      • G.Yancheva, N. R. Nowaczyk et al (2007)
      • Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA
      • doi:10.1594/PANGAEA.587840
      • Article
      • G. Yancheva, N. R. Nowaczyk et al (2007)
      • Influence of the intertropical convergence zone on the East Asian monsoon
      • Nature 445, 74-77
      • doi:10.1038/nature05431
    Cites Cites back to article
  • 23. How it could look Links to data ark:/a50600/rb2468097 doi:10.5060/rb2468097 http://n2t.net/a5060/rb2468097
  • 24.  
  • 25.  
  • 26. Bridging the data gap
    • DataCite/EZID empowers researchers
    • DataCite/EZID supports data centers
    • DataCite/EZID extends libraries
    • DataCite/EZID enables publishers
    http://datacite.org DATACITE
  • 27. EZID
    • Now in production, V. 1.0, API
    • Limited release.
        • First customer is DataONE member, Dryad
  • 28. Upcoming milestones
    • EZID
      • September 2010: V2.0 UI
      • Expanded release base to include UC partners +
    • DataCite Metadata Standard
      • Community review to begin August 2010
  • 29. Thank you!
    • [email_address]
    • [email_address]