• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dataset Metadata, Tools and Approaches for Access and Preservation
 

Dataset Metadata, Tools and Approaches for Access and Preservation

on

  • 1,903 views

Joan Starr's presentation at ALA Midwinter 2012 to ALCTS Intellectual Access to Preservation Metadata Interest Group

Joan Starr's presentation at ALA Midwinter 2012 to ALCTS Intellectual Access to Preservation Metadata Interest Group

Statistics

Views

Total Views
1,903
Views on SlideShare
1,903
Embed Views
0

Actions

Likes
0
Downloads
21
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Thank you for this opportunity to speakwith you today about Dataset Metadata. Let me give special thanks to Meghan for asking me to speak.Image credits:By: MDB 28, http://www.flickr.com/photos/mdb28/3787828482/By davecurlee, http://www.flickr.com/photos/davecurlee/4689603488/By sabarishr: http://www.flickr.com/photos/sabarishr/5422105775/By rkrichardson: http://www.flickr.com/photos/45126397@N06/4506403367/By awsheffield: http://www.flickr.com/photos/awsheffield/5932294950/By Scutter: http://www.flickr.com/photos/scutter/109698478/By Amy the Nurse: http://www.flickr.com/photos/amyashcraft/4522601466/By Anita & Greg: http://www.flickr.com/photos/anita__greg/2849453715/
  • My library:Serving the 10 UC campuses226,000 students 134,000 faculty and staffWorking collaborativelylibrariesdata centersmuseums, archivesfaculty and researchersCDL has historically provided strategic, integrated technical and program services in a broad portfolio, including:Groundbreaking licensing agreementsUnion bibliographic servicesData curation & preservation toolsOpen access publishing servicesCDL: http://www.cdlib.org/
  • My group:The UC Curation Center is creative partnership between the CDL, the ten UC campuses, and peer institutions in the community.A community of shared concern and practiceProvide solutions, services, resources for digital assets Pool & distribute diverse experience, expertise, & resources
  • Access: The researchers’ requirements are for: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)To provide fair credit to those responsible: exposureTo aid scientific reproducibility—re-useTo ensure scientific transparency and reasonable accountability: verificationTo aid in tracking the impact of the work: citation trackingPreservation: Easy to maintainThe funders’ requirements are for data management and And the library’s charge is to preserve our institutions’ scholarly assets
  • How are we going to meet these needs? If we go back to what the domains are doing…From ESIP –Earth Science Information Partners (same link)Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.Title--the formal title of the data setVersion--the precise version of the data used. Careful version tracking is critical to accurate citation.Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.Locator/Identifier--this could be a URL but ideally it should be a persistant service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.From ICPSR—Inter-University Consortium for Political and Social Research http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/citations.jspTitleAuthorDateVersionPersistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)
  • What’s in common: the persistent identifier.
  • DataCite was formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information, so there is a presence in Asia.California Digital Library was one of the founding members.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
  • DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
  • Let’s take a look at one.So you can see that with just the identifier and a simple set of metadata, you get:Location for VERIFICATIONEXPOSURE & CITATION TRACKING(this is not an actual DOI, nor an actual study)
  • And here’s that same DOI some time later.THE STRING NEVER CHANGES. This means it can be cited, tracked and associated with all kinds of metadata. More on that in a minute.
  • EZID is CDL’s application for offering DataCite DOIs as well as other identifiers.
  • If you go to the Home Page, you can use the UI to test EZID. CLICK for HELP TAB.
  • On the Help screen, you have the choice of creating a test ARK or DOI.[CLICK] Click the Create buttonARKs and DOIsARKsFlexibleCase-sensitiveSpecial features support granularityCan be deletedInexpensiveDOIsEstablished brand in publishingIndexed by major A&I citation databases DataCite policies applyCannot be deletedMore costlyDOIs should be assigned to objects that are under good long-term management, and where there is an intention is to make the object persistently available.DOIs must be registered exclusively with metadata that is available to public view.Can DOIs and ARKs work together?Yes. For example, researchers may choose to use ARKs for unpublished materials associated with an object that has been registered with a DOI. These two identifier schemes can work well together, and EZID offers them both, along with policy support consistent across both schemes.
  • EZID creates the identifier and sends you to the MANAGE tab where you have the opportunity to enter a target URL and other metadata.UI support: Dublin KernelDublin CoreDataCite KernelAPI supportAll of the aboveFull DataCite Schema
  • When you hover over a field, it opens up for editing as you can see here. This is where you would go if you wanted to maintain the metadata or the target URL.
  • Now let’s take a look at the full DataCite Metadata set.MDS=Metadata SearchRemember, we said that any solution needed to:ALLOW the submitter to accurately describe the object so that anyone accessing knows what they are getting. ALLOW the submitter to give credit where credit is due. PROVIDEsupport for *data management* – format, version, rights
  • The 5 Required properties = basic citation elementsIdentifier = DOI now; in future may open upCreator is repeatable; Name can have a nameIdentifier and schema as in ORCHID idTitle is repeatable and has an optional type attribute for Alternative Title; Subtitle; and TranslatedTitlePublisher: “In the case of datasets, "publish" is understood to mean making the data available to the community of researchers.”IDENTIFIER=VERIFICATIONALLOW the submitter to give credit where credit is due. EXPOSURE & CITATION TRACKINGIf the Year field isn’t quite what you want—use the repeatable DATE field in the optional set.
  • Optional elementsIncludes support for data management FORMAT, VERSION, RIGHTSIn addition, some of these offer expansion of the required set. Contributer expands Creator. Date expands PublicationYear.But the distinctive strength comes from Number 12.[CLICK]
  • Optional elementsThe Family Jewels = RelatedIdentifer, relationTypeIsCitedBy & Cites IsSupplementTo  & IsSupplementedByIsContinuedBy  & Continues IsNewVersionOf  & IsPreviousVersionOf  IsPartOf  & HasPart  IsDocumentedBy & Documents isCompiledBy & CompilesIsVariantFormOf  & IsOriginalFormOfCOMING IN 2.3: IsIdenticalTo
  • “Data Management Planning” is a popularphrase these days. As metadata and preservation librarians, I think you’ll find many of the concepts to be very familiar, if wearing new clothes.Let me tell you a little story about the life of a dataset.You start out in a laptop (or a tablet) travelling around, or under a deskMaybe then you get emailed across the country or around the world.Years can go by as you get updated and altered.Eventually, maybe you have a day in the sun: your researcher decides to write up the results and cite you.Then, perhaps, it’s back to a server in the dark. Or, you move from server to server. Will you be forgotten?
  • That’s why we at California Digital Library have taken a life cycle approach with an array of tools.CDL has developed an array of tools and services ranging from the first stage of developing a data management plan, through to formal publication. We encourage researchers to assign an ID early in the process - to provide a credible data management plan for funders;- to make the later stages easier and - to manage situations where changes might occur during the course of the research—a researcher changes institutions or a research team changes the location of their data, for example.
  • What difference does this make? +Keep track of datasets early in the life cycle when you’re not sure where you’re keeping things.+Get common & stable references for distributed research teams.+Citations in published papers keep working even if the data moves.+Part of data organization plans mandated by funders.Photo credits:in field: by Dave Rogers http://www.flickr.com/photos/dave-rogers/2815036285/at black board: ©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5405812887with stars: ©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5406308654around table: by David Mellis, http://www.flickr.com/photos/mellis/7675610/
  • Dublin Core application profile available for the DataCite Metadata Schema; we’ll keep it up to date and in-sync. From the DCMI: “A DCAP is designed to promote interoperability within the constraints of the Dublin Core model and to encourage harmonization of usage and convergence on "emerging semantics" around its edges.”Content Service exposes our metadata stored in the DataCite Metadata Store (MDS) using multiple formats Alpha version: The service can be accessed at http://data.datacite.orgEZID: UI redesignActivity reportingBrowse & searchEnhanced persistence supportAutomated link checking in support of our new Tombstone pages (a web page returned for a resource no longer found at its target location of record. The tombstone may provide “last known” metadata, including the original owner.)Exposure for metadata—evidence that citations will increase (Heather Piwowar’s work)Thomson-Reuters (Web of Knowledge)Elsevier (Scopus)OAI? RSS?GoogleScholar
  • Library as a service center: Consulting, EZID, DMP,DCXL, IRInformation: pointing people to standards, toolsHelping make connections.
  • The next steps for you as individuals is to get more information and try things for yourselves.

Dataset Metadata, Tools and Approaches for Access and Preservation Dataset Metadata, Tools and Approaches for Access and Preservation Presentation Transcript