Datashare cni spring2013


Published on

A presentation given at the Coalition for Networked Information meeting in which efforts to support sharing of research data at UCSF

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Mission: enable individual researchers to share their research data sets with the global communityA researcher at UCSF. In his work as the Principal Investigator of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) he concluded that widespread data sharing can be achieved now, with great scientific and economic benefits. All ADNI raw data is immediately shared, without embargo, with all scientists in the world. The project is very successful: more than 300 publications have resulted from use of the ADNI data resource. This success demonstrates the feasibility and benefits of sharing data.Clinical and Translational Sciences InstituteWorking together to develop a resource that meets the needs of the researcher while leverging the
  • Cell Press, Nature Publishing Group, PNASOver 100 papers published between 2009-11 in journals from 3 publishers that have data sharing requirementsSome researchers have national repositories for their data (e.g. GenBank) while others don’t.Campus focused on developing infrastructure for storing and analyzing data but not sharing it generally. Additionally, the current focus is on clinical data, especially anonymized data from the electronic health record, and not basic or social sciences data.
  • CTSI: Mission is to accelerate the research enterprise and saw the sharing of data as one way to accomplish this mission. Library: Interest in as well as an extension of the support of the open access ‘UC3: provide the tools to the UC community to promote digital scholarship.
  • Screenshot of eScholarship, running XTF
  • Screenshot of Datashare, running XTF
  • Datashare website; enduser selects title
  • Full information on dataset; enduser selects download
  • Data Use Agreement (DUA) for enduser.
  • Fulfills requirements, existing and emerging
  • Increases visibility of work
  • The new TR Data Citation Index provides a mechanism to discover data for re-use in the same familiar fashion as discovering publications
  • Long term preservation, easy access to your own dataMerritt repository is an active archival environ with format migration and integrity checks – a smart filing cabinet for digital assets
  • Centralizing resources improves efficiency by streamlining/standardizing the process and saves money in the aggregateCurrently gather data to support this
  • Metadata, data/metadataseparation, file size, DUA, Discoverability, interoperability, README
  • Metadata, data/metadataseparation, file size, DUA, Discoverability, interoperability, README
  • Datashare cni spring2013

    1. 1. DataShare:Collaboration Yields Promising ToolJulia Kochi, UCSF LibraryAngela Rizk-Jackson, UCSF CTSIPerry Willett, California Digital LibraryCNI 2013 MeetingSan Antonio, TX
    2. 2. The BackgroundJulia KochiUCSF Library
    3. 3. What is DataShare?An open data repository for the UCSFresearcherA concept initially envisioned by MichaelWeiner, M.D.A collaboration between UCSF CTSI, UCSFLibrary, and the California Digital Library
    4. 4. The ProblemIncreasing requirements to share data• NIH grants >$500k• Publisher requirementsUnequal availability of national repositoriesCampus prioritiesFASTR, White House Directive
    5. 5. The PartnersUCSF CTSI• Knowledge of the researcher, access to the dataUCSF Library• Metadata expertise, programming resourcesUC3• Preservations tools, services and expertise
    6. 6. Technical InfrastructurePerry WillettCalifornia Digital Library
    7. 7. DataShare ComponentsMerritt: CDLEZID: CDLXTF: CDL, UCSF LibraryIngest tool: UCSF Library
    8. 8. Merritt Repository ServiceBuilt on “micro-services” principlesContent and format agnosticHas a UI and RESTful APIs to submit andretrieve content, and check statusesCan serve as either “dark” or “bright” archiveAdded public access, data useagreements, asynchronous downloads as partof Datashare project
    9. 9. EZIDService for creation and management of long-term identifiersCurrently supports ARKs and DOIs; other typesin planning stagesRegisters DOIs with DataCiteHas a UI and APIs with good documentation
    10. 10. XTFeXtensible Text FrameworkDeveloped and maintained by CDLRuns several CDL services:• eScholarship• Online Archive of California• CalisphereFaceted browsing, full-text search, otherdesirable features
    11. 11. Ingest toolSubmitting content to a digital repository ishard and costlyAn attempt to simplify several aspects:• Digital object creation• Metadata creation• Object submission
    12. 12. Interactions for submissionIngestToolCreates MetadataAssembles DatasetSubmits to MerrittMerrittEZIDDataciteRequests DOISubmits Metadatato EZIDRegisters DOI and MetadataXTFRequests ATOM feed for collectionRetrieves MetadataIndex metadataReceives DOIPackages objectGets ATOM feed
    13. 13. Process for Endusers Search, browse Request dataset download Fill out Data Use Agreement Receive dataset
    14. 14. Lessons learnedPartnerships• Many hands make light work• Real users uncover hidden assumptionsScale• Object size• Number of files• Upload and download
    15. 15. If you build it, will they come?Angela Rizk-JacksonUCSF CTSI
    16. 16. What will it take?Sketch by Juliana Olivera Silva via Flickr+
    17. 17. Providing Incentives: RequirementsOrganization Data Access Requirement # UCSF StudiesFundingNIH Grants >$500K (2003 on), Specificprograms318 (activeprojects)693 (inactive)NSF All funded projects (2005 on) 19Foundations(e.g. Moore, Gates,Hewlett)All funded projects 3, 31, 19PublishingNaturePublishing Group(Nature, Science,etc.)All published studies (2009-2011) 58Cell Press(Cell, Neuron, etc.)All published studies (2009-2011) 48PNAS All published studies (2005-2011) 26
    18. 18. Providing Incentives: Visibility01010010101001100101001010100100100110001111 Enhances collaborative opportunities 69% increase in citation rate forpublications associated with shared data(Piwowar, 2007)
    19. 19. Providing Incentives: Credit
    20. 20. Providing Incentives:Preservation & Access
    21. 21. Providing Incentives: InstitutionalUCLA Royce Hall photo courtesy of Adam Fagen via Flickr• Support researcher needs• Improved archiving efficiency• Cost savings
    22. 22. Eliminating Barriers1. Time / Effort- Minimal requirements- Specific tools (e.g. ingest)- Integrate into existing workflow2. Control- Data Use Agreement- Centralized service3. Cultural Paradigm- Outreach- Demonstrate value
    23. 23. Other Collaborators
    24. 24. Lessons LearnedDon’t underestimate technical matters• Separating data & metadataStandards are not standard• Metadata schema (Dublin Core  DataCite)• InterpretationPolicy issues are ever-present• Data Ownership & Data Use Agreements• Privacy & Consent (Human subjects)Keep in mind the entire lifecycle: ALL users• Discoverability & interoperability• README File
    25. 25. Next StepsOutreachSystem enhancements• Design overhaul• Ingest mechanism• DUA menuPolicy navigationProof-of-concept
    26. 26. Discussion TopicsWhat incentives have you found useful toencourage adoption of this type of resource?Are you using data use agreements? Uniformor individualized?Where do you see institutional datarepositories fitting in the larger ecosystem?
    27. 27. More infoDatashare:CDL:• Merritt:• EZID:• XTF:UCSF Library:UCSF CTSI: – NIH Grant # UL1 TR000004