Reese when one ir isn’t enough


Published on

How Research Data Will Disintegrate the IR, Terry Reese, OSU; Institutional Repository Case Studies

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Reese when one ir isn’t enough

  1. 1. How research data will disintegrate the IR Terry Reese Gray Family Chair for Innovative Library Services Oregon State University
  2. 2. Traditional roles of the IR <ul><li>Act as the central hub for institutional academic output </li></ul><ul><ul><li>Traditionally reports and publications – easy to define content </li></ul></ul><ul><li>In many cases eclectic </li></ul><ul><ul><li>Integrating both faculty, student and sometimes gray literature </li></ul></ul><ul><li>Provides long-term preservation services </li></ul><ul><ul><li>At least at the byte level for content </li></ul></ul>
  3. 3. What about research data? <ul><li>What about it? </li></ul><ul><ul><li>It’s bigger </li></ul></ul><ul><ul><li>It’s domain specific </li></ul></ul><ul><ul><li>It’s difficult to describe (by librarians) </li></ul></ul><ul><ul><li>It’s difficult to know if it should be preserved “forever” </li></ul></ul><ul><li>And it can be fit into our current systems </li></ul><ul><ul><li>Data is data – even if it’s big data </li></ul></ul>
  4. 4. Integrating Research Data Into DSpace <ul><li>By and large, institutions can utilize DSpace to store research data the same way as they do publication, with some creativity </li></ul><ul><ul><li>Generally, additional metadata should be attached as part of the upload process </li></ul></ul><ul><ul><li>For multi-part research data elements, a manifest providing information about the research data elements can simplify reuse. </li></ul></ul><ul><li>The size of most research data does pose a problem to DSpace‘s default configuration with a unified file storage </li></ul>
  5. 5. DSpace specific configuration <ul><li>By default, DSpace expects its asset store to be located on the same server as the application. </li></ul><ul><li>To deal with data rich repositories, DSpace can utilize multiple asset stores. </li></ul><ul><li>When dealing with research data, we’ve found it better to create virtual asset stores, creating symbolic links from our SAN server which can by dynamically resized to the an asset store directory physically present on the server </li></ul><ul><ul><li>This allows us to handle backups more efficiently and run DSpace on a minimal configuration server with lots of RAM and little diskspace. </li></ul></ul>
  6. 6. And how research data changes things <ul><li>Unlike publications, research data is often less easily defined in terms of our current workflows </li></ul><ul><ul><li>What is the final product, and how is that represented </li></ul></ul><ul><li>Current tools like DSpace lack adequate version control to support the iterative approach to some data processing </li></ul><ul><li>Unlike finished publications, research data submission generally must be done by the researcher </li></ul><ul><ul><li>Rather than by librarian deposit by scanning CVs. </li></ul></ul>
  7. 7. New trends we see at OSU <ul><li>OSU is a very heterogeneous environment – with each college maintaining individual IT units due to a lack of a strong central IT </li></ul><ul><li>This has lead to the development of many departmental private IR efforts that departments utilized for the management of research data </li></ul><ul><li>Results in an ecosystem where different units on campus have widely different needs as they relate to day – many of which cannot be facilitated by our current IR efforts </li></ul>
  8. 8. What researchers have been telling us <ul><li>Many departments do not wish to have the library store and preserve their data, but would like to be discovered within their systems </li></ul><ul><li>Many departments would like the library to help in providing metadata training and assignment </li></ul><ul><li>Many departments would like the library to facilitate the development of preservation objects and unique identifiers for content (which can be used outside of the campus) </li></ul>
  9. 9. The Expanding definition of an IR <ul><li>A need for a disintegrated IR effort that could allow the central IR system to interact with and discover resources in different systems without the need to preserve data </li></ul><ul><li>A disassembling of repository services </li></ul><ul><ul><li>Disassociating things like id minting and object packing (like creation of preservation “bags”) from the IR </li></ul></ul><ul><li>Like distributed source code management tools, introduce the concept of both centralized and distributed versioning to repository objects </li></ul>
  10. 10. What we are trying at OSU <ul><li>Like many, working on the development of micro-services to facilitate minting and bagging of content to facilitate potential acquisitions of data by the library </li></ul><ul><li>Working with external departments to implement OAI-PMH which can be harvested into DSpace to provide dynamic linking to materials stored outside of the repository </li></ul><ul><li>Working with departments to create a hand-over process for materials that eventually will be turned over to the library for long-term preservation </li></ul><ul><li>Working with departments to develop a withdrawal strategy for materials that eventually are sunseted. </li></ul>