Your SlideShare is downloading. ×
Datashare cni spring2013
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Datashare cni spring2013


Published on

A presentation given at the Coalition for Networked Information meeting in which efforts to support sharing of research data at UCSF

A presentation given at the Coalition for Networked Information meeting in which efforts to support sharing of research data at UCSF

Published in: Education, Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Mission: enable individual researchers to share their research data sets with the global communityA researcher at UCSF. In his work as the Principal Investigator of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) he concluded that widespread data sharing can be achieved now, with great scientific and economic benefits. All ADNI raw data is immediately shared, without embargo, with all scientists in the world. The project is very successful: more than 300 publications have resulted from use of the ADNI data resource. This success demonstrates the feasibility and benefits of sharing data.Clinical and Translational Sciences InstituteWorking together to develop a resource that meets the needs of the researcher while leverging the
  • Cell Press, Nature Publishing Group, PNASOver 100 papers published between 2009-11 in journals from 3 publishers that have data sharing requirementsSome researchers have national repositories for their data (e.g. GenBank) while others don’t.Campus focused on developing infrastructure for storing and analyzing data but not sharing it generally. Additionally, the current focus is on clinical data, especially anonymized data from the electronic health record, and not basic or social sciences data.
  • CTSI: Mission is to accelerate the research enterprise and saw the sharing of data as one way to accomplish this mission. Library: Interest in as well as an extension of the support of the open access ‘UC3: provide the tools to the UC community to promote digital scholarship.
  • Screenshot of eScholarship, running XTF
  • Screenshot of Datashare, running XTF
  • Datashare website; enduser selects title
  • Full information on dataset; enduser selects download
  • Data Use Agreement (DUA) for enduser.
  • Fulfills requirements, existing and emerging
  • Increases visibility of work
  • The new TR Data Citation Index provides a mechanism to discover data for re-use in the same familiar fashion as discovering publications
  • Long term preservation, easy access to your own dataMerritt repository is an active archival environ with format migration and integrity checks – a smart filing cabinet for digital assets
  • Centralizing resources improves efficiency by streamlining/standardizing the process and saves money in the aggregateCurrently gather data to support this
  • Metadata, data/metadataseparation, file size, DUA, Discoverability, interoperability, README
  • Metadata, data/metadataseparation, file size, DUA, Discoverability, interoperability, README
  • Transcript

    • 1. DataShare:Collaboration Yields Promising ToolJulia Kochi, UCSF LibraryAngela Rizk-Jackson, UCSF CTSIPerry Willett, California Digital LibraryCNI 2013 MeetingSan Antonio, TX
    • 2. The BackgroundJulia KochiUCSF Library
    • 3. What is DataShare?An open data repository for the UCSFresearcherA concept initially envisioned by MichaelWeiner, M.D.A collaboration between UCSF CTSI, UCSFLibrary, and the California Digital Library
    • 4. The ProblemIncreasing requirements to share data• NIH grants >$500k• Publisher requirementsUnequal availability of national repositoriesCampus prioritiesFASTR, White House Directive
    • 5. The PartnersUCSF CTSI• Knowledge of the researcher, access to the dataUCSF Library• Metadata expertise, programming resourcesUC3• Preservations tools, services and expertise
    • 6. Technical InfrastructurePerry WillettCalifornia Digital Library
    • 7. DataShare ComponentsMerritt: CDLEZID: CDLXTF: CDL, UCSF LibraryIngest tool: UCSF Library
    • 8. Merritt Repository ServiceBuilt on “micro-services” principlesContent and format agnosticHas a UI and RESTful APIs to submit andretrieve content, and check statusesCan serve as either “dark” or “bright” archiveAdded public access, data useagreements, asynchronous downloads as partof Datashare project
    • 9. EZIDService for creation and management of long-term identifiersCurrently supports ARKs and DOIs; other typesin planning stagesRegisters DOIs with DataCiteHas a UI and APIs with good documentation
    • 10. XTFeXtensible Text FrameworkDeveloped and maintained by CDLRuns several CDL services:• eScholarship• Online Archive of California• CalisphereFaceted browsing, full-text search, otherdesirable features
    • 11. Ingest toolSubmitting content to a digital repository ishard and costlyAn attempt to simplify several aspects:• Digital object creation• Metadata creation• Object submission
    • 12. Interactions for submissionIngestToolCreates MetadataAssembles DatasetSubmits to MerrittMerrittEZIDDataciteRequests DOISubmits Metadatato EZIDRegisters DOI and MetadataXTFRequests ATOM feed for collectionRetrieves MetadataIndex metadataReceives DOIPackages objectGets ATOM feed
    • 13. Process for Endusers Search, browse Request dataset download Fill out Data Use Agreement Receive dataset
    • 14. Lessons learnedPartnerships• Many hands make light work• Real users uncover hidden assumptionsScale• Object size• Number of files• Upload and download
    • 15. If you build it, will they come?Angela Rizk-JacksonUCSF CTSI
    • 16. What will it take?Sketch by Juliana Olivera Silva via Flickr+
    • 17. Providing Incentives: RequirementsOrganization Data Access Requirement # UCSF StudiesFundingNIH Grants >$500K (2003 on), Specificprograms318 (activeprojects)693 (inactive)NSF All funded projects (2005 on) 19Foundations(e.g. Moore, Gates,Hewlett)All funded projects 3, 31, 19PublishingNaturePublishing Group(Nature, Science,etc.)All published studies (2009-2011) 58Cell Press(Cell, Neuron, etc.)All published studies (2009-2011) 48PNAS All published studies (2005-2011) 26
    • 18. Providing Incentives: Visibility01010010101001100101001010100100100110001111 Enhances collaborative opportunities 69% increase in citation rate forpublications associated with shared data(Piwowar, 2007)
    • 19. Providing Incentives: Credit
    • 20. Providing Incentives:Preservation & Access
    • 21. Providing Incentives: InstitutionalUCLA Royce Hall photo courtesy of Adam Fagen via Flickr• Support researcher needs• Improved archiving efficiency• Cost savings
    • 22. Eliminating Barriers1. Time / Effort- Minimal requirements- Specific tools (e.g. ingest)- Integrate into existing workflow2. Control- Data Use Agreement- Centralized service3. Cultural Paradigm- Outreach- Demonstrate value
    • 23. Other Collaborators
    • 24. Lessons LearnedDon’t underestimate technical matters• Separating data & metadataStandards are not standard• Metadata schema (Dublin Core  DataCite)• InterpretationPolicy issues are ever-present• Data Ownership & Data Use Agreements• Privacy & Consent (Human subjects)Keep in mind the entire lifecycle: ALL users• Discoverability & interoperability• README File
    • 25. Next StepsOutreachSystem enhancements• Design overhaul• Ingest mechanism• DUA menuPolicy navigationProof-of-concept
    • 26. Discussion TopicsWhat incentives have you found useful toencourage adoption of this type of resource?Are you using data use agreements? Uniformor individualized?Where do you see institutional datarepositories fitting in the larger ecosystem?
    • 27. More infoDatashare:CDL:• Merritt:• EZID:• XTF:UCSF Library:UCSF CTSI: – NIH Grant # UL1 TR000004