Cold, Dark, and Lonely: An Archive Moves Online

965 views

Published on

Slide deck for my presentation at the All About Repositories webinar on 10/14/2009.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
965
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cold, Dark, and Lonely: An Archive Moves Online

  1. 1. Cold, Dark, and Lonely An Archive Moves Online Bryan Beecher IT Director ICPSR
  2. 2. What’s ICPSR? <ul><li>Inter-university Consortium for Political and Social Research </li></ul><ul><li>Clients </li></ul><ul><ul><li>Higher education </li></ul></ul><ul><ul><li>US Government </li></ul></ul><ul><ul><li>Our “hot, flat, and crowded” world </li></ul></ul><ul><li>In business since 1962 </li></ul>
  3. 3. What do we do? <ul><li>Acquire, curate, and deliver social science data to researchers, students, policy-makers, etc. </li></ul><ul><ul><li>JSTOR of data </li></ul></ul><ul><li>Cover many different fields </li></ul><ul><ul><li>Political Science, Economics, Sociology, Demography, Criminal Justice, and many more </li></ul></ul>
  4. 4. What content do we curate? <ul><li>Primarily survey data </li></ul><ul><ul><li>Also aggregate government data (such as Census data) </li></ul></ul><ul><li>Tabular </li></ul><ul><ul><li>Rows = respondents </li></ul></ul><ul><ul><li>Columns = variables </li></ul></ul><ul><li>SAS, SPSS, Stata, even Excel </li></ul>
  5. 5. ICPSR and OAIS <ul><li>Clients deposit data (ingest) </li></ul><ul><li>ICPSR normalizes content into plain text data (ASCII, Unicode) and “setups” for stat pkgs + adds metadata (ingest + data mgmt) </li></ul><ul><li>Preserves content (archival storage) </li></ul><ul><li>Makes it available to others (access) </li></ul>
  6. 6. Access <ul><li>Mechanisms have evolved over time </li></ul><ul><li>Tapes + USPS </li></ul><ul><li>FTP </li></ul><ul><li>Gopher </li></ul><ul><li>Web </li></ul>
  7. 7. Archival Storage <ul><li>Historically kept two copies on tape </li></ul><ul><ul><li>Off-line, local (Ann Arbor, MI) </li></ul></ul><ul><li>Worked, but </li></ul><ul><ul><li>Expensive </li></ul></ul><ul><ul><li>Cannot browse </li></ul></ul><ul><ul><li>Are the bits OK? </li></ul></ul><ul><li>“ The Warehouse”… </li></ul>
  8. 9. But then in 2006… <ul><li>Created Chief Preservation Officer role </li></ul><ul><ul><li>Nancy McGovern </li></ul></ul><ul><li>Assigned Archival Storage engineering and operations to the IT shop </li></ul><ul><ul><li>Bryan Beecher </li></ul></ul>
  9. 10. 2006 - 2008 <ul><li>Digital Preservation Management program begins </li></ul><ul><li>Warehouse cleared, closed </li></ul><ul><li>Tapes read, checked, destroyed </li></ul><ul><li>6TB of content over 600k unique files </li></ul><ul><li>Lots of files </li></ul><ul><ul><li>Not so “cold” and “dark” any more… </li></ul></ul>
  10. 12. Fedora, Part 1 <ul><li>Lots of files, not so much metadata </li></ul><ul><li>Always know the aggregate object (“study” number) </li></ul><ul><li>Use simple Fedora Content Model (the data “keepsake”) to store the content </li></ul><ul><li>Small step from “files” to “objects” </li></ul>
  11. 13. Fedora, Part 2 <ul><li>Would really like “smarter” objects </li></ul><ul><ul><li>Strongly typed </li></ul></ul><ul><ul><li>Well defined relationships </li></ul></ul><ul><ul><li>Rich services </li></ul></ul><ul><li>Definitely possible, particularly for more modern content (post 2002) </li></ul><ul><li>If only we had the time and money… </li></ul>
  12. 14. NSF EAGER grant <ul><li>EArly-concept Grants for Exploratory Research </li></ul><ul><ul><li>Eighteen months for 1.5 people </li></ul></ul><ul><li>Deliverables </li></ul><ul><ul><li>CMA for social science data and docs </li></ul></ul><ul><ul><li>Packaging tools to create FOXML </li></ul></ul><ul><ul><li>Nifty SDeps and SDefs </li></ul></ul>
  13. 15. Thank you! Bryan Beecher techaticpsr.blogspot.com

×