Mission: enable individual researchers to share their research data sets with the global communityA researcher at UCSF. In his work as the Principal Investigator of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) he concluded that widespread data sharing can be achieved now, with great scientific and economic benefits. All ADNI raw data is immediately shared, without embargo, with all scientists in the world. The project is very successful: more than 300 publications have resulted from use of the ADNI data resource. This success demonstrates the feasibility and benefits of sharing data.Clinical and Translational Sciences InstituteWorking together to develop a resource that meets the needs of the researcher while leverging the
Cell Press, Nature Publishing Group, PNASOver 100 papers published between 2009-11 in journals from 3 publishers that have data sharing requirementsSome researchers have national repositories for their data (e.g. GenBank) while others don’t.Campus focused on developing infrastructure for storing and analyzing data but not sharing it generally. Additionally, the current focus is on clinical data, especially anonymized data from the electronic health record, and not basic or social sciences data.
CTSI: Mission is to accelerate the research enterprise and saw the sharing of data as one way to accomplish this mission. Library: Interest in as well as an extension of the support of the open access ‘UC3: provide the tools to the UC community to promote digital scholarship.
Screenshot of eScholarship, running XTF
Screenshot of Datashare, running XTF
Datashare website; enduser selects title
Full information on dataset; enduser selects download
Data Use Agreement (DUA) for enduser.
Fulfills requirements, existing and emerging
Increases visibility of work
The new TR Data Citation Index provides a mechanism to discover data for re-use in the same familiar fashion as discovering publications
Long term preservation, easy access to your own dataMerritt repository is an active archival environ with format migration and integrity checks – a smart filing cabinet for digital assets
Centralizing resources improves efficiency by streamlining/standardizing the process and saves money in the aggregateCurrently gather data to support this
What is DataShare?An open data repository for the UCSFresearcherA concept initially envisioned by MichaelWeiner, M.D.A collaboration between UCSF CTSI, UCSFLibrary, and the California Digital Library
The ProblemIncreasing requirements to share data• NIH grants >$500k• Publisher requirementsUnequal availability of national repositoriesCampus prioritiesFASTR, White House Directive
The PartnersUCSF CTSI• Knowledge of the researcher, access to the dataUCSF Library• Metadata expertise, programming resourcesUC3• Preservations tools, services and expertise
Technical InfrastructurePerry WillettCalifornia Digital Library
Merritt Repository ServiceBuilt on “micro-services” principlesContent and format agnosticHas a UI and RESTful APIs to submit andretrieve content, and check statusesCan serve as either “dark” or “bright” archiveAdded public access, data useagreements, asynchronous downloads as partof Datashare project
EZIDService for creation and management of long-term identifiersCurrently supports ARKs and DOIs; other typesin planning stagesRegisters DOIs with DataCiteHas a UI and APIs with good documentation
XTFeXtensible Text FrameworkDeveloped and maintained by CDLRuns several CDL services:• eScholarship• Online Archive of California• CalisphereFaceted browsing, full-text search, otherdesirable features
Ingest toolSubmitting content to a digital repository ishard and costlyAn attempt to simplify several aspects:• Digital object creation• Metadata creation• Object submission
Interactions for submissionIngestToolCreates MetadataAssembles DatasetSubmits to MerrittMerrittEZIDDataciteRequests DOISubmits Metadatato EZIDRegisters DOI and MetadataXTFRequests ATOM feed for collectionRetrieves MetadataIndex metadataReceives DOIPackages objectGets ATOM feed
Process for Endusers Search, browse Request dataset download Fill out Data Use Agreement Receive dataset
Lessons learnedPartnerships• Many hands make light work• Real users uncover hidden assumptionsScale• Object size• Number of files• Upload and download
If you build it, will they come?Angela Rizk-JacksonUCSF CTSI
What will it take?Sketch by Juliana Olivera Silva via Flickr+
Providing Incentives: RequirementsOrganization Data Access Requirement # UCSF StudiesFundingNIH Grants >$500K (2003 on), Specificprograms318 (activeprojects)693 (inactive)NSF All funded projects (2005 on) 19Foundations(e.g. Moore, Gates,Hewlett)All funded projects 3, 31, 19PublishingNaturePublishing Group(Nature, Science,etc.)All published studies (2009-2011) 58Cell Press(Cell, Neuron, etc.)All published studies (2009-2011) 48PNAS All published studies (2005-2011) 26
Providing Incentives: Visibility01010010101001100101001010100100100110001111 Enhances collaborative opportunities 69% increase in citation rate forpublications associated with shared data(Piwowar, 2007)
Providing Incentives: InstitutionalUCLA Royce Hall photo courtesy of Adam Fagen via Flickr• Support researcher needs• Improved archiving efficiency• Cost savings
Eliminating Barriers1. Time / Effort- Minimal requirements- Specific tools (e.g. ingest)- Integrate into existing workflow2. Control- Data Use Agreement- Centralized service3. Cultural Paradigm- Outreach- Demonstrate value
Lessons LearnedDon’t underestimate technical matters• Separating data & metadataStandards are not standard• Metadata schema (Dublin Core DataCite)• InterpretationPolicy issues are ever-present• Data Ownership & Data Use Agreements• Privacy & Consent (Human subjects)Keep in mind the entire lifecycle: ALL users• Discoverability & interoperability• README File
Next StepsOutreachSystem enhancements• Design overhaul• Ingest mechanism• DUA menuPolicy navigationProof-of-concept
Discussion TopicsWhat incentives have you found useful toencourage adoption of this type of resource?Are you using data use agreements? Uniformor individualized?Where do you see institutional datarepositories fitting in the larger ecosystem?