Datashare cni spring2013


Published on

A presentation given at the Coalition for Networked Information describing efforts undertaken by 3 partnered organizations (UCSF CTSI, UCSF Library, California Digital Library) to support sharing of research data by UCSF investigators

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Mission: enable individual researchers to share their research data sets with the global communityA researcher at UCSF. In his work as the Principal Investigator of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) he concluded that widespread data sharing can be achieved now, with great scientific and economic benefits. All ADNI raw data is immediately shared, without embargo, with all scientists in the world. The project is very successful: more than 300 publications have resulted from use of the ADNI data resource. This success demonstrates the feasibility and benefits of sharing data.Clinical and Translational Sciences InstituteWorking together to develop a resource that meets the needs of the researcher while leverging the
  • Cell Press, Nature Publishing Group, PNASOver 100 papers published between 2009-11 in journals from 3 publishers that have data sharing requirementsSome researchers have national repositories for their data (e.g. GenBank) while others don’t.Campus focused on developing infrastructure for storing and analyzing data but not sharing it generally. Additionally, the current focus is on clinical data, especially anonymized data from the electronic health record, and not basic or social sciences data.
  • CTSI: Mission is to accelerate the research enterprise and saw the sharing of data as one way to accomplish this mission. Library: Interest in as well as an extension of the support of the open access ‘UC3: provide the tools to the UC community to promote digital scholarship.
  • Screenshot of eScholarship, running XTF
  • Screenshot of Datashare, running XTF
  • Datashare website; enduser selects title
  • Full information on dataset; enduser selects download
  • Data Use Agreement (DUA) for enduser.
  • Fulfills requirements, existing and emerging
  • Increases visibility of work
  • The new TR Data Citation Index provides a mechanism to discover data for re-use in the same familiar fashion as discovering publications
  • Long term preservation, easy access to your own dataMerritt repository is an active archival environ with format migration and integrity checks – a smart filing cabinet for digital assets
  • Centralizing resources improves efficiency by streamlining/standardizing the process and saves money in the aggregateCurrently gather data to support this
  • Metadata, data/metadataseparation, file size, DUA, Discoverability, interoperability, README
  • Metadata, data/metadataseparation, file size, DUA, Discoverability, interoperability, README
  • Datashare cni spring2013

    1. 1. DataShare:Collaboration Yields Promising Tool Julia Kochi, UCSF Library Angela Rizk-Jackson, UCSF CTSI Perry Willett, California Digital Library CNI 2013 Meeting San Antonio, TX
    2. 2. The Background Julia Kochi UCSF Library
    3. 3. What is DataShare?An open data repository for the UCSF researcherA concept initially envisioned by Michael Weiner, M.D.A collaboration between UCSF CTSI, UCSF Library, and the California Digital Library
    4. 4. The ProblemIncreasing requirements to share data • NIH grants >$500k • Publisher requirementsUnequal availability of national repositoriesCampus prioritiesFASTR, White House Directive
    5. 5. The PartnersUCSF CTSI • Knowledge of the researcher, access to the dataUCSF Library • Metadata expertise, programming resourcesUC3 • Preservations tools, services and expertise
    6. 6. Technical Infrastructure Perry Willett California Digital Library
    7. 7. DataShare ComponentsMerritt: CDLEZID: CDLXTF: CDL, UCSF LibraryIngest tool: UCSF Library
    8. 8. Merritt Repository ServiceBuilt on “micro-services” principlesContent and format agnosticHas a UI and RESTful APIs to submit and retrieve content, and check statusesCan serve as either “dark” or “bright” archiveAdded public access, data use agreements, asynchronous downloads as part of Datashare project
    9. 9. EZIDService for creation and management of long- term identifiersCurrently supports ARKs and DOIs; other types in planning stagesRegisters DOIs with DataCiteHas a UI and APIs with good documentation
    10. 10. XTFeXtensible Text FrameworkDeveloped and maintained by CDLRuns several CDL services: • eScholarship • Online Archive of California • CalisphereFaceted browsing, full-text search, other desirable features
    11. 11. Ingest toolSubmitting content to a digital repository is hard and costlyAn attempt to simplify several aspects: • Digital object creation • Metadata creation • Object submission
    12. 12. Interactions for submission Creates Metadata Assembles Dataset Datacite Packages object Submits to Merritt Registers DOI and Metadata Ingest Requests DOI Tool Merritt Submits Metadata to EZIDRequests ATOM feed for collection Receives DOIGets ATOM feedRetrieves Metadata XTF EZID Index metadata
    13. 13. Process for Endusers Search, browse Request dataset download Fill out Data Use Agreement Receive dataset
    14. 14. Lessons learnedPartnerships • Many hands make light work • Real users uncover hidden assumptionsScale • Object size • Number of files • Upload and download
    15. 15. If you build it, will they come? Angela Rizk-Jackson UCSF CTSI
    16. 16. What will it take? +Sketch by Juliana Olivera Silva via Flickr
    17. 17. Providing Incentives: Requirements Organization Data Access Requirement # UCSF StudiesFunding NIH Grants >$500K (2003 on), Specific 318 (active programs projects) 693 (inactive) NSF All funded projects (2005 on) 19 Foundations All funded projects 3, 31, 19(e.g. Moore, Gates, Hewlett)Publishing Nature All published studies (2009-2011) 58 Publishing Group (Nature, Science, etc.) Cell Press All published studies (2009-2011) 48(Cell, Neuron, etc.) PNAS All published studies (2005-2011) 26
    18. 18. Providing Incentives: Visibility 01010010101 00110010100 10101001001 00110001111 Enhances collaborative opportunities 69% increase in citation rate for publications associated with shared data (Piwowar, 2007)
    19. 19. Providing Incentives: Credit
    20. 20. Providing Incentives:Preservation & Access
    21. 21. Providing Incentives: Institutional • Support researcher needs • Improved archiving efficiency • Cost savingsUCLA Royce Hall photo courtesy of Adam Fagen via Flickr
    22. 22. Eliminating Barriers1. Time / Effort - Minimal requirements - Specific tools (e.g. ingest) - Integrate into existing workflow2. Control - Data Use Agreement - Centralized service3. Cultural Paradigm - Outreach - Demonstrate value
    23. 23. Other Collaborators
    24. 24. Lessons LearnedDon’t underestimate technical matters • Separating data & metadataStandards are not standard • Metadata schema (Dublin Core  DataCite) • InterpretationPolicy issues are ever-present • Data Ownership & Data Use Agreements • Privacy & Consent (Human subjects)Keep in mind the entire lifecycle: ALL users • Discoverability & interoperability • README File
    25. 25. Next StepsOutreachSystem enhancements • Design overhaul • Ingest mechanism • DUA menuPolicy navigationProof-of-concept
    26. 26. Discussion TopicsWhat incentives have you found useful to encourage adoption of this type of resource?Are you using data use agreements? Uniform or individualized?Where do you see institutional data repositories fitting in the larger ecosystem?
    27. 27. More infoDatashare:CDL: • Merritt: • EZID: • XTF:UCSF Library:UCSF CTSI: NCATS – NIH Grant # UL1 TR000004