NDSA Digital Preservation 2013
Integrating Repositories for
Research Data Sharing
Stephen Abrams
California Digital Librar...
NDSA Digital Preservation 2013
Why is data curation important?
 Accelerating scientific progress
 Enabling appropriate s...
NDSA Digital Preservation 2013
Merritt
 Curation repository available to the UC community and
external partners
 Preserv...
NDSA Digital Preservation 2013
Merritt
Storage node
Storage
broker
Inventory
ONEShare UNM
storage node
Storage node
UI/API...
NDSA Digital Preservation 2013
(Some) issues to address
 Scale
 Individual objects ranging from 0 to 47,000 files
 Indi...
NDSA Digital Preservation 2013
(Some) issues to address
 Scale
 Individual objects ranging from 0 to 47,000 files
 Indi...
NDSA Digital Preservation 2013
Scale
 Avoiding client timeout
 ≤ 2 GB: File-based  stream-based AIP-to-DIP processing
...
NDSA Digital Preservation 2013
Control
 Data use agreements (DUAs)
 Explicit assertion of license requirements and terms...
NDSA Digital Preservation 2013
User experience
 Due to its open eligibility policy, Merritt will always provide a
more ge...
NDSA Digital Preservation 2013
User experience
 Due to its open eligibility policy, Merritt will always provide a
more ge...
NDSA Digital Preservation 2013
DataShare
 “The goal of the DataShare project is to catalyze widespread
sharing of scienti...
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Best practice advice
 Describe
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Schema-directed
metadata editor
 DataCite schema
schema.d...
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 File browse or
drag-n-drop
 Curate
 Discover
 ...
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Manage datasets
 Discover
 Share
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Discover
 Faceted search and
browse
 S...
NDSA Digital Preservation 2013
DataShare
 Prepare
 Describe
 Upload
 Curate
 Discover
 Share
 DataONE
 DataCite
 ...
NDSA Digital Preservation 2013
Merritt + DataShare
Storage node
Storage
broker
Inventory
ONEShare UNM
storage node
Storage...
NDSA Digital Preservation 2013
Research Hub
 “Research Hub provides powerful tools for content
management and collaborati...
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Acquire and
arrange
 Describe
 Upload
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Schema-directed
metadata editors
 Upload
 Curate
 Di...
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Direct action
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Policy-based
workflow rules
 Curate
 Discove...
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Drag-and-drop
 Curate
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Curate
 Manage datasets
 Discover
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Curate
 Discover
 Search / browse
 Share
NDSA Digital Preservation 2013
Research Hub
 Prepare
 Describe
 Upload
 Curate
 Discover
 Share
 Curatorial
invitat...
NDSA Digital Preservation 2013
Merritt + DataShare + Research Hub
Storage node
Storage
broker
Inventory
ONEShare UNM
stora...
NDSA Digital Preservation 2013
Future integrations
 UCTrust/InCommon federation
Incommon.org
 Open Context archaeologica...
NDSA Digital Preservation 2013
Sharing research through repositories
 Conform to institutional policy, publication requir...
NDSA Digital Preservation 2013
Sharing research through repositories
 Conform to institutional policy, publication requir...
NDSA Digital Preservation 2013
For more information
 Merritt
www.cdlib.org/uc3/merritt
uc3@ucop.edu
Stephen Abrams David ...
Upcoming SlideShare
Loading in...5
×

Ndsa 2013-abrams-integrating-repositories-for-data-sharing

242

Published on

The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merritt’s coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
242
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Copyright © 2013 by The Regents of the University of CaliforniaThis work is made available under the terms of the Creative Commons Attribution-ShareAlike 3.0 license
  • MatijaGrguric, Scale up! – Curved elements!, http://www.flickr.com/photos/32195273@N05/5107685264Tom Rafferty, Remote control!, http://www.flickr.com/photos/67945918@N00/4319529821Barry Egan, File rio 2006, http://www.flickr.com/photos/vixon/116447718
  • Kevin Smith, Converting 8.24 bit samples in CoreAudio on iOS, http://www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios/Paul B. Hartzog, Server room, http://www.flickr.com/photos/paulbhartzog/680749585
  • Ndsa 2013-abrams-integrating-repositories-for-data-sharing

    1. 1. NDSA Digital Preservation 2013 Integrating Repositories for Research Data Sharing Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California, San Francisco Noah Wittman University of California, Berkeley
    2. 2. NDSA Digital Preservation 2013 Why is data curation important?  Accelerating scientific progress  Enabling appropriate scrutiny and verification of results  Promoting integrity and debate  Facilitating new collaborations  Avoiding needless duplication of effort  Increasingly, complying with institutional policies, publication requirements, and funder mandates Cf. White and Teds (2011), “Making the case for research data management” DCC briefing paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
    3. 3. NDSA Digital Preservation 2013 Merritt  Curation repository available to the UC community and external partners  Preservation and access  Content agnostic, model free  Highly decentralized micro-services architecture Cf. Abrams, Cruse, Kunze, and Minor (2011), “Curation micro-services: A pipeline metaphor for repositories,” Journal of Digital Information 12(2), journals.tdl.org/jodi/article/view/1605  26 curatorial units  271 collections  325,000 objects  450,000 versions  4,500,000 files  13 TB www.cdlib.org/uc3/merritt merritt.cdlib.org
    4. 4. NDSA Digital Preservation 2013 Merritt Storage node Storage broker Inventory ONEShare UNM storage node Storage node UI/API UI/API UI/API LDAP LDAP LDAP RDBMS Fixity User agent Message queue RDBMS Load balancer Ingest Load balancer Ingest Ingest EZID No-SQL DataCite … DataONE member node RDBMS RDBMS DataONE coord’ing node … IDF Load balancer Web of Knowledge Primo SAN SDSC cloud
    5. 5. NDSA Digital Preservation 2013 (Some) issues to address  Scale  Individual objects ranging from 0 to 47,000 files  Individual files ranging from 0 to 14 GB  Maintaining control  Concern over potential loss of control over dissemination and use of data  User experience  Switch from organizational to individual interaction www.flickr.com/photos/vixon/116447718www.flickr.com/photos/traftery/4319529821www.flickr.com/photos/32195273@N05/51076852642
    6. 6. NDSA Digital Preservation 2013 (Some) issues to address  Scale  Individual objects ranging from 0 to 47,000 files  Individual files ranging from 0 to 14 GB  Maintaining control  Concern over potential loss of control over dissemination and use of data  User experience  Switch from organizational to individual interaction Augment repository function by composition (when possible) and addition (when necessary)  Loosely-coupled integration with external community supported systems and services
    7. 7. NDSA Digital Preservation 2013 Scale  Avoiding client timeout  ≤ 2 GB: File-based  stream-based AIP-to-DIP processing  > 2 GB: Asynchronous delivery  Email notification with personalized, time-limited URL  Streamlined storage provisioning  SDSC cloud cloud.sdsc.edu www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios www.flickr.com/photos/paulbhartzog/680749585
    8. 8. NDSA Digital Preservation 2013 Control  Data use agreements (DUAs)  Explicit assertion of license requirements and terms of use  Curatorial and consumer notification of acceptance Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252- 69, doi:10.1016/j.jbi.2006.09.001 From: no-reply-merritt@ucop.edu Subject:Merritt DUA acceptance Name: Stephen Abrams Affiliation: California Digital Library Collection: UCSF DataShare Object: Frontotemporal Lobar Degeneration (FTLD) Date: 2013-05-3109:50:34PDT Terms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the identity of any of the study subjects. (2) I will share these data only with my immediate co-workers, and I will not transfer these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them. (3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement ...
    9. 9. NDSA Digital Preservation 2013 User experience  Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems  Shifting user roles, shifting expectations  Institutional  individual researcher  Behavioral expectations set by the commercial/mobile web
    10. 10. NDSA Digital Preservation 2013 User experience  Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems  Shifting user roles, shifting expectations  Institutional  individual researcher  Behavioral expectations set by the commercial web  Integration with extant services that better provide the desired UX  DataShare  Research Hub
    11. 11. NDSA Digital Preservation 2013 DataShare  “The goal of the DataShare project is to catalyze widespread sharing of scientific research data” datashare.ucsf.edu  UCSF Clinical and Translational Science Institute ctsi.ucsf.edu  UCSF Library www.library.ucsf.edu  UCSF Center for Imaging of Neurodegenerative Disease www.radiology.ucsf.edu/cind  Architecture  DataShare submission client (Ruby/Rails)  Merritt curation repository  DataShare discovery portal (XTF/Java)
    12. 12. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Discover  Share
    13. 13. NDSA Digital Preservation 2013 DataShare  Prepare  Best practice advice  Describe  Upload  Curate  Discover  Share
    14. 14. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Schema-directed metadata editor  DataCite schema schema.datacite.org  Upload  Curate  Discover  Share
    15. 15. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  File browse or drag-n-drop  Curate  Discover  Share
    16. 16. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Manage datasets  Discover  Share
    17. 17. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Discover  Faceted search and browse  Share
    18. 18. NDSA Digital Preservation 2013 DataShare  Prepare  Describe  Upload  Curate  Discover  Share  DataONE  DataCite  (soon) Primo Web of Knowledge  SEO
    19. 19. NDSA Digital Preservation 2013 Merritt + DataShare Storage node Storage broker Inventory ONEShare UNM storage node Storage node UI/API UI/API UI/API LDAP LDAP LDAP RDBMS Fixity User agent Message queue RDBMS Load balancer Ingest Load balancer Ingest Ingest EZID No-SQL DataCite … DataONE member node RDBMS RDBMS DataONE coord’ing node … IDF Load balancer Web of Knowledge Primo SAN SDSC cloud DataShare upload Collection Atom feed XTF xtf.cdlib.org DataShare portal Lucene
    20. 20. NDSA Digital Preservation 2013 Research Hub  “Research Hub provides powerful tools for content management and collaboration” hub.berkeley.edu  Alfresco CMS www.alfresco.com  770 projects, 3,900 users  Personal file management  Project collaboration  Departmental resource pooling  Research data management  Desktop sync, mobile app, Adobe Creative Suite  UC Berkeley Information Services and Technology ist.berkeley.edu
    21. 21. NDSA Digital Preservation 2013 Research Hub  Prepare  Acquire and arrange  Describe  Upload  Curate  Discover  Share
    22. 22. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Schema-directed metadata editors  Upload  Curate  Discover  Share
    23. 23. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Direct action  Curate  Discover  Share
    24. 24. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Policy-based workflow rules  Curate  Discover  Share
    25. 25. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Drag-and-drop  Curate  Discover  Share
    26. 26. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Curate  Manage datasets  Discover  Share
    27. 27. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Curate  Discover  Search / browse  Share
    28. 28. NDSA Digital Preservation 2013 Research Hub  Prepare  Describe  Upload  Curate  Discover  Share  Curatorial invitation
    29. 29. NDSA Digital Preservation 2013 Merritt + DataShare + Research Hub Storage node Storage broker Inventory ONEShare UNM storage node Storage node UI/API UI/API UI/API LDAP LDAP LDAP RDBMS Fixity User agent Message queue RDBMS Load balancer Ingest Load balancer Ingest Ingest EZID No-SQL DataCite … DataONE member node RDBMS RDBMS DataONE coord’ing node … IDF Load balancer Web of Knowledge Primo SAN SDSC cloud DataShare upload Collection Atom feed XTF xtf.cdlib.org DataShare portal Lucene Research Hub
    30. 30. NDSA Digital Preservation 2013 Future integrations  UCTrust/InCommon federation Incommon.org  Open Context archaeological portal opencontext.org  Nuxeo www.nuxeo.com  UC system-wide DAMS  Islandora islandora.ca  Fedora  Merritt  DPN www.dpn.org
    31. 31. NDSA Digital Preservation 2013 Sharing research through repositories  Conform to institutional policy, publication requirements, and funder mandates  Pro-active curation of valuable research outputs  Stable citation and access  High visibility publication and discovery  Use metrics
    32. 32. NDSA Digital Preservation 2013 Sharing research through repositories  Conform to institutional policy, publication requirements, and funder mandates  Pro-active curation of valuable research outputs  Stable citation and access  High visibility publication and discovery  Use metrics  Repository layering as an appropriate division of labor  Exploiting existing capabilities already in local use
    33. 33. NDSA Digital Preservation 2013 For more information  Merritt www.cdlib.org/uc3/merritt uc3@ucop.edu Stephen Abrams David Loy Patricia Cruse Mark Reyes Shirin Faenza Joan Starr Scott Fisher Carly Strasser Erik Hetzner Marisa Strong Joshua Hubbard Bhavitavya Vedula Greg Janée Kenneth Weiss John Kunze Perry Willet Rosalie Lack  DataShare datashare.ucsf.edu Geoffrey Boushey Julia Kochi Anirvan Chatterjee Angela Rizk-Jackson Maninder Kahlon Michael Weiner  Research Hub hub.berkeley.edu Ian Crew Michael McCarthy (Tribloom) Noah Wittman Patrick McGrath www.slideshare.net/UC3/ndsa-2013abramsintegratingrepositoriesfordatasharing
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×