ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Meyer - RDAP12


Published on

ESIP Federation: Community-Driven, Collaborative Governance
Carol Beaton Meyer

Presentation at Research Data Access & Preservation Summit
22 March 2012

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Funders: Principally NASA & NOAA, some EPA, a little from NSF thru EarthCubeStarted out with 24 membersESIP provides:Forum for open, science data-centric community collaborationVoluntary participationCommunity of Practice – practitioners share expertise & technologiesTrusted authority, built by the communityInfrastructure for community collaborationCollaborative workspace on web (Drupal, wiki, listservs)Communications (coordinated, ad hoc, open)GovernanceFormal (constitution, bylaws, strategic plan)Informal (cluster-based governance for consensus building)Ethos: Results Network effect – productive connections made thru ESIP that likely would not have been made
  • ESIP provides community coordination to support interoperability at the data, systems, human and organization level. ESIP works through informal and formal structures, depending on what’s needed at a given moment
  • Community-driven activity of the ESIP Federation’s Data Preservation and Stewardship ClusterDifferent parts of the community were using different identifiers for their data & community wanted to know which identifiers worked best for different data types.Criteria: Technical Value - Scalability, Security, Standards, Interoperable, Compatible with Naming Conventions, Require a registry?, Dependence on a naming authority (longevity of naming authority institution), Longevity of technology usedUser Value – Will publishers allow it in citation?, Does identifier have any additional trust value?, Does the identifier have meaning? (Should identifiers be transparent or opaque?)Archival Value - How maintainable is the identification scheme when data migrates from one archive to another?, Cost associated with identifier?, Does the identification scheme handle data that is not on the web? What about physical objects?Looked at:Using DOIs for Entire Datasets (Results: DOIs for Components Within Datasets (Results:
  • Purposes of data citation:To aid scientific reproducibility through direct, unambiguous reference to the precise data used in a particular study. (This is the paramount purpose and also the hardest to achieve).To provide fair credit for data creators or authors, data stewards, and other critical people in the data production and curation process.To ensure scientific transparency and reasonable accountability for authors and stewards.To aid in tracking the impact of data set and the associated data center through reference in scientific literature.To help data authors verify how their data are being used.To help future data users identify how others have used the data.Elements:Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.Title--the formal title of the data setVersion--the precise version of the data used. Careful version tracking is critical to accurate citation.Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.Locator/Identifier--this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.AccessDate and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
  • Submission of new proposalsForum to review proposalsAuthor revision based on feedbackVoting on change proposalsRatification or rejection by editors***To maintain an open community process, all steps are posted to the mailing list and/or wiki.
  • Funded by NOAA
  • Funded by NOAAResponsive to Perception that there was little training for the scientist having to do data management
  • ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Meyer - RDAP12

    1. 1. ESIP Federation: Community-Driven, Collaborative Governance Carol Beaton Meyer Research Data Access and Preservation Summit 2012 March 22, 2012
    2. 2. About ESIP Formed in 1998 by NASA ~140 Members (2012)  Representing multi-agency, sector, science domains Forum for Practitioners to Exchange Knowledge, Share Technologies and Collaborate on Research ESIP Ethos  Community-Driven (self-forming groups)  Open  Collaborative  Participatory  Innovative
    3. 3. What Drives ESIP Community? Desire for Interoperability Best Practices (from consensus) Leverage Expertise and Resources of Others
    4. 4. Community-Driven Data Management Case Study 1: Data Identifiers Used Testbed to Evaluate Various Identifier Schemes  DOI, LSID, OID, PURL, ARK, UUID, XRI, Handles, URI/URN/URL  Evaluated each against set criteria  Governance:  Community-defined problem  Developed review criteria  Each identifier reviewed against established criteria  Community analysis of results  Published Results in Journal of Earth Science Informatics, Volume 4, Number 3, 139-160, DOI: 10.1007/s12145-011-0083-6 Implementation/Additional Testing Through California Digital Library
    5. 5. Community-Driven Data Management Case Study 2: Data Citation Guidelines Citation Guidelines for Data Providers Governance  Community-need identified  Used existing best practices (IPY Citation work) as baseline  Iterative process within Data Preservation & Stewardship Cluster  Broader ESIP community review (input sought & provided)  Guidelines (best practices) adopted by ESIP Assembly, January 2012 Core Elements  Author(s)  Release Date  Title  Version  Archive and/or Distributor  Locator/Identifier  Access Date and Time Full Guidelines:
    6. 6. Community-Driven Data Management Case Study 3: Discovery Conventions Data centers using different discovery services  OpenSearch, DataCasting, ServiceCasting Services  Issues: interoperability, differing standards, distributed orgs Goal: develop usable and simple solutions that leverage existing standards, conventions and technologies, that have a high likelihood of voluntary adoption Governance – adopted by community (bottom up approach)  Submit  Review  Revisions  Vote  Ratify/Reject  Recommendations for Adoption
    7. 7. ESIP Community Resources ESIP Governance - Wiki Workspace – ESIP Commons – coming Spring 2012 Next ESIP Meetings  July 17-20, 2012 in Madison, Wisconsin  January 9-11, 2013 in Washington, DC Join the community discussions 
    8. 8. Contact Carol Meyer   919.870.7140
    9. 9. Extra Slides
    10. 10. Community-Driven Data Management Case Study: Data Management Short Course Two-year Volunteer Effort  Phase 1 – aimed at scientists  Phase 2 – aimed at data managers  Drew expertise from ESIP community Course Outline  The case for data stewardship  Data management plans  Local data management  Preservation strategies  Responsible data use Governance  Community-identified need/opportunity  Trial workshop at 2010 AGU  Defined scope of content  Volunteers drafted short modules  Peer review and editorial review
    11. 11. Community-Driven Data Management Case Study: Data Stewardship Principles Preservation and Stewardship Cluster Considered:  Existing member data policies (NASA, NOAA, etc.)  Other organizations’ policies (CODATA, GEO, etc.)  Data Creators, Data Intermediaries and Data Users Consensus Document  f  Home institution policies supersede  Room for commercialization  Adopted in January 2012