Like this? Share it with your network

Share

Curating Humanities Data: Law, technology and reality

  • 218 views
Uploaded on

Invited talk, April 16, 2014, IIT Digital Humanities Series: State of the Practice http://www.iit.edu/news/iittoday/?p=30525

Invited talk, April 16, 2014, IIT Digital Humanities Series: State of the Practice http://www.iit.edu/news/iittoday/?p=30525

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
218
On Slideshare
218
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Curating Humanities Data: Law, technology and reality Claire Stewart Northwestern University
  • 2. Preservation Schulze Collection, Northwestern University Library
  • 3. Curation Digital Curation Centre (DCC) Curation Lifecycle Model
  • 4. Curation goals • Implement services and process to gather, organize and preserve information about the circumstances of creation and interpretation • Facilitate re-use – Of underlying objects? – Of interpretations, analyses, etc?
  • 5. What are data?
  • 6. Data DCC: “Data, any information in binary digital form, is at the centre of the Curation Lifecycle.” OMB: “Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
  • 7. Sciences/Humanities BICEP2 (South Pole telescope) Performativity, Place, Space Burgess and Hamming, 2011BICEP2 Collaboration, 2014
  • 8. Why curation? What are the risks?
  • 9. Questions to ask and answer • Do we care what this object is? • Do we care where this object came from? • Do we care how it was rendered into this form? • Do we care what interventions have taken place? • Do we care who performed those interventions? • How do we identify and evaluate the original scholarly contributions?
  • 10. Salo’s (via SPOT) Threat model / Threats to print • Homelessness • Water • Flora and fauna • Physical damage • Loss or destruction Salo, Dorothea. 2013. “Risk Management and Auditing” presented at the DH Curation Institute, October 16, College Park, MD. http://files.dsalo.info/S13RiskMgmtAuditing.pdf.
  • 11. Salo’s threat model / Threats to digital • Physical media failure • Bitrot • File format obsolescence • Forgetting what you have • Forgetting what the stuff you have means • Rights and DRM (digital rights management) • Lack of organizational commitment • Ignorance (assumptions) • Apathy Apollo data tape
  • 12. Northwestern Books
  • 13. Northwestern Books
  • 14. Northwestern Books
  • 15. Repository as a service • Description and characterization - descriptive, provenance and technical metadata • Selection, conversion, digitization • Deposit and versioning • Interoperability, APIs for ingest, discovery • Access control, copyright support and other legal/regulatory compliance • Persistence – Stable, permanent links (URLs, DOIs, etc.) – Health of digital objects – Replication and dark archiving – Migration or emulation, virtualization
  • 16. Northwestern Books • A very library-centered project • We keep: all versions of the digitized pages; checksums; object relationship information; extracted information about the process; inferred information about who, where, when • No easy export, integration with analytical tools, etc. • No information about use, integrations, annotations
  • 17. Who are the players? “Understanding the relationship between critical or interpretive activities which are also curatorial, and more traditional curatorial activities, which bear more relation to tasks traditionally carried out by libraries and archives, will be important in the context of the humanities.” (Flanders and Muñoz 2011)
  • 18. Practical challenges • Data often inextricable from apparatus (software for preservation, querying, etc.) making selective curation/preservation difficult • Data versioning – e.g. Google Books OCR • Links between data (in a particular version?) and other research outputs – the monograph, the journal article, etc.
  • 19. HathiTrust The Library The Research Center (HTRC)
  • 20. The Workset
  • 21. The analysis
  • 22. Who gets credit?
  • 23. Text markup and analysis Folger Shakespeare Library TEI-encoded texts • Full text, encoded down to the word • Responsibility statement indicates encoder • No facsimile texts • Not attached to analytical apparatus • Creative Commons noncommercial license (more shortly)
  • 24. Systems and approaches • A: The data and the environment are one • B: A, but some elements also available for extraction and re-use elsewhere • C: All the elements extractable and available for re-use in other environments, settings
  • 25. Chicago Homicide
  • 26. Legal & policy issues • Unhelpful application of copyright is common • Embargoes and first publication concerns • License and attribution ‘stacking’ problems with digital data • Expressive v non-expressive debates and litigation • Creative commons, carving out a scholarly space (what does ‘non-commercial’ mean, anyway?)
  • 27. Copyright basics • Only original expressions are eligible • Copyright is limited in duration (it expires) • Copyright only applies to certain activities: making copies, distributing copies, etc. • There are broad exceptions (fair use, library reproduction, etc.)
  • 28. The Blake Archive
  • 29. Chicago Homicide
  • 30. Litigation
  • 31. Legal & policy issues • Unhelpful application of copyright is common • Embargoes and first publication concerns • License and attribution ‘stacking’ problems with digital data • Expressive v non-expressive debates and litigation • Creative commons, carving out a scholarly space (what does ‘non-commercial’ mean, anyway?)
  • 32. Final thoughts • Not everything can be made explicit • Not everything should be retained • We cannot afford to do everything • Re-use is elusive • We don’t know how to address all of the legal issues, but trying something is a good start
  • 33. Additional image credits • What are data? Work found at https://www.flickr.com/photos/rh2ox/9990024683 / undefined (https://creativecommons.org/licenses/by-sa/2.0/) • Why? What are the risks? Work found at https://www.flickr.com/photos/swanksalot/2704017177/ / undefined (https://creativecommons.org/licenses/by-sa/2.0/) • Sciences/Humanities T/Q/U Maps. 2014. http://bicepkeck.org/B2_2014_i_figs/tqu_maps.pdf. Fig. 8, Piccini, Angela. 2009. “Locating Grid Technologies: Performativity, Place, Space: Challenging the Institutionalized Spaces of E-Science.” DHQ: Digital Humanities Quarterly: 3 (4). • Salo’s threat model/threats to print, Work found at Cornell University Library Department of Preservation & Collection Maintenance. “Brittle Books.” Accessed April 16, 2014. http://wwwdev.library.cornell.edu/preservation/operations/brittlebooks.html. • Salo’s threat model/threats to digital, Center, NASA Goddard Space Flight. Apollo Data Tape, July 10, 2009. http://www.flickr.com/photos/gsfc/3720663276/.
  • 34. Bibliography • BICEP2 Collaboration, P. A. R. Ade, R. W. Aikin, D. Barkats, S. J. Benton, C. A. Bischoff, J. J. Bock, et al. 2014. “BICEP2 I: Detection Of B-Mode Polarization at Degree Angular Scales.” arXiv:1403.3985 [astro-Ph, Physics:gr-Qc, Physics:hep-Ph, Physics:hep-Th], March. http://arxiv.org/abs/1403.3985. • Burgess, Helen J., and Jeanne Hamming. 2011. “New Media in the Academy: Labor and the Production of Knowledge in Scholarly Multimedia.” DHQ: Digital Humanities Quarterly 5 (3). http://www.digitalhumanities.org/dhq/vol/5/3/000102/000102.html. • Digital Curation Centre. 2014. “DCC Curation Lifecycle Model.” Accessed April 16. http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf. • Flanders, Julia, and Trevor Muñoz. “An Introduction to Humanities Data Curation.” DH Curation Guide, September 22, 2011. http://guide.dhcuration.org/intro/. • Piccini, Angela. 2009. “Locating Grid Technologies: Performativity, Place, Space: Challenging the Institutionalized Spaces of E-Science.” DHQ: Digital Humanities Quarterly: 3 (4). http://www.digitalhumanities.org/dhq/vol/3/4/000076/000076.html. • Salo, Dorothea. 2013. “Risk Management and Auditing” presented at the DH Curation Institute, October 16, College Park, MD. http://files.dsalo.info/S13RiskMgmtAuditing.pdf. • United States Office of Management and Budget. 2013. Uniform Administrative Requirements, Cost Principles, and Audit Requirements for Federal Awards. https://federalregister.gov/a/2013-30465. • Cheverie, Joan. “The HathiTrust Decision: A Win for Fair Use and the Use of Technology | EDUCAUSE.edu.” Cheverj’s Blog EDUCAUSE.edu, October 12, 2012. http://www.educause.edu/blogs/cheverij/hathitrust-decision-win-fair-use-and-use-technology.
  • 35. Addt’l links to cited projects/orgs • Northwestern Digital Humanities Summer Faculty Workshop • Northwestern University Digital Humanities Laboratory • Schulze-Greenleaf Library (Schulze Collection) • Digital Curation Centre • Northwestern Books • HathiTrust and HathiTrust Research Center • Eighteenth-Century Book Tracker • Folger Digital Texts • Homicide in Chicago 1870-1930 (Chicago Homicide) • The William Blake Archive
  • 36. This presentation © 2014, Claire Stewart All of the referenced sites, images and text remain the copyright of their respective authors This work is licensed under the Creative Commons CC-BY 4.0 International License