Drake Mendez curatecamp 2015

•Download as PPTX, PDF•

1 like•382 views

JuliaYKim

Princeton Mudd Library born-digital workflows. Focus on description.

Government & Nonprofit

Maximizing Description to
Enhance Access to Born-
Digital Archival Collections
Seeley G. Mudd Manuscript Library
Princeton University Library
Rossy Mendez, Public Services Project Archivist
Jarrett M. Drake, Digital Archivist
CURATEcamp, Brooklyn Historical Society
April 23, 2015

“How we describe
the collections in our
care influences the
ability of people to
discover, access, use
and interpret them”
Trends in Practice:
Archival Arrangement and
Description, pg 17.
<extent>
<scopecontent>
<unittitle>

<extent>
1. Physical Space Quantity/Arrangement
2. Electronic Digital

<unittitle>
Office of President
Records, Shirley Tilghman
Subgroup (AC379)

<scopecontent>
Series Level
“One third of the digital files are a
mixture of PDF’s and Excel
Spreadsheets”

Multi-level Description of Digital Records
Reality
For born-digital records, the Archive’s existing descriptive
workflows failed to provide sufficient context and precision for
<did> elements, including <unitdate>, <unittitle>, & <extent>.
Challenge
For multi-level records, how does one create these
elements programmatically?

Previous Workflow
CSV output from FTK Imager
AT Resource RecordWindows Explorer
EAD <did> element

Revised Workflow
Question
What are the key metadata points we should extract from
born-digital records and later represent in EAD?
Answer
1. Names of each folder  <unittitle>
2. Modified dates of the oldest and newest files  <unitdate>
3. Numbers of folders and numbers of files  <extent>

Current Description Workflow
Complete digital records processing for Mudd Library can be found at:
http://rbsc.princeton.edu/policies/guidance-recommended-file-formats
.txt
.csv
.xls
.xml

Current Description Workflow: Extract
Shell script to extract <unittitle>, <unitdate>, and <extent> values
-maxdepth 1

Current Description Workflow: Transform
Output of shell script as .txt file
Output of shell script transformed into EAD

Description Workflow Enhancements
 Eliminate string values for <extent> elements and
minimize post-processing of data
 Use topic modeling for textual data (fondz or another
program) and write scripts for basic textual analysis
(e.g., automated page count for PDF’s)
 Index all names of directories and files and represent
their structure through a file browser embedded in the
finding aid and/or the repository (Hydra)

Similar to Drake Mendez curatecamp 2015

Week14-Multimedia Information Retrieval.pptx

HasanulFahmi2

accelerating-data-driven

Joshua Chudy

Libraries, archives, and museums have traditionally preserved and provided access to many different kinds of physical materials, including books, papers, theses, faculty research notes, correspondence, and more. These items have been critical for researchers to have a full understanding of their fields of study as well as the history and context that surround the work. However, in recent years many of these equivalent materials only exist electronically on websites, laptops, private servers, and social media. These digital materials are currently very difficult to track, preserve, and make accessible. Future researchers may very well find a black hole of content: discovering early physical materials and late electronic records, but little information for the late 20th though early 21st Centuries. In other words, a portion of history, including the field of Mathematics, may be lost unless this electronic content--perhaps some content you have right now--is cared for properly. The presenters will cover the issues surrounding Digital Preservation, including steps needed to make sure data is reasonably safe. Additionally they will pose a small number of discrete challenges and unsolved problems in the field of Digital Preservation, where Mathematicians may be able to help with analysis and new algorithms.

The Quest for Digital Preservation: Will Part of Math History Be Gone Forever?

newmanld

Sailing on the ocean of 1s and 0s

Woodruff Solutions LLC

Discovery event peter burnhill (aggregation as tactic)

RDTF-Discovery

IApart1

Wen Geng

Dh presentation 2019

University of Cape Town

Aggregation as tactic sm new

Historic Environment Scotland

Aggregation as Tactic

EDINA, University of Edinburgh

Dp Geosc Info Presentation Final Version 2

Smita Chandra

An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...

Keith.May

UW Libraries Data Services Forum

Stephanie Wright

Agile Curation Poster

Josh Young

Establishing the significant properties of digital research

GarethKnight

Developing tools in humanities computing

Dave Marcial

Towards the digital_archiving_sysytem_for_field_ar (1)

Nadeeka Rathnabahu

The Delicate Tension of Digital Technology

Jan Recker @ University of Hamburg

Big Data in Learning Analytics - Analytics for Everyday Learning

Stefan Dietze

ICAME 2010

nottyknight

Issues problems

IAEME Publication

Similar to Drake Mendez curatecamp 2015 (20)

Week14-Multimedia Information Retrieval.pptx

accelerating-data-driven

The Quest for Digital Preservation: Will Part of Math History Be Gone Forever?

Sailing on the ocean of 1s and 0s

Discovery event peter burnhill (aggregation as tactic)

IApart1

Dh presentation 2019

Aggregation as tactic sm new

Aggregation as Tactic

Dp Geosc Info Presentation Final Version 2

An Open and Shut Case? Shared Standards for Stratigraphic Data and Heritage L...

UW Libraries Data Services Forum

Agile Curation Poster

Establishing the significant properties of digital research

Developing tools in humanities computing

Towards the digital_archiving_sysytem_for_field_ar (1)

The Delicate Tension of Digital Technology

Big Data in Learning Analytics - Analytics for Everyday Learning

ICAME 2010

Issues problems

Recently uploaded

2024 Zoom Reinstein Legacy Asbestos Webinar

Linda Reinstein

Financing strategies for adaptation. Presentation for CANCC

NAP Global Network

Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...

Chandigarh Call girls 9053900678 Call girls in Chandigarh

VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K to 25K High Profile Escorts In Pune Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 (V030524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...

SUHANI PANDEY

TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...

robinsonayot

Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For Sex At Your Doorstep Booking Contact Details WhatsApp Chat: +91-6297143586 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 02-may-2024(v.n)

Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...

tanu pandey

2024: The FAR, Federal Acquisition Regulations - Part 29

JSchaus & Associates

Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For Sex At Your Doorstep Booking Contact Details WhatsApp Chat: +91-6297143586 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 02-may-2024(v.n)

Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...

tanu pandey

Government e Marketplace GeM Presentation

gememarket11

celebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hour Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔$V15 ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

celebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hour

Call Girls in Nagpur High Profile

The NAP process & South-South peer learning

NAP Global Network

VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25K High Profile Escorts In Pune Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 (V030524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...

SUHANI PANDEY

Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Service At Affordable Rate Booking Contact Details WhatsApp Chat: +91-6297143586 pune Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus they look fabulously elegant; making an impressionable. Independent Escorts pune understands the value of confidentiality and discretion - they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide - 01-may-2024(v.n)

Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...

Call Girls in Nagpur High Profile

Item # 4 - 231 Encino Ave (Significance Only).pdf

ahcitycouncil

Just Call Vip call girls Wardha Escorts ☎️8617370543 Starting From 5K to 25K High Profile Escorts In Wardha WhatsApp Chat With Parul:-8617370543 There are a number of Wardha Escorts willing to meet you at an affordable rate, which also possesses high moral standards and humanitarian tendencies. These girls can help satisfy the sexual desires of clients without fail; it is therefore essential that clients select an established service. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call

Just Call Vip call girls Wardha Escorts ☎️8617370543 Starting From 5K to 25K ...

Dipal Arora

World Press Freedom Day 2024; May 3rd - Poster

Christina Parmionova

Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking Booking Now open +91- 7737669865 Why you Choose Us- +91- 7737669865 HOT⇄ 7737669865 Mr ashu ji Call Mr ashu Ji +91- 7737669865 (V020524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models

Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking

roncy bisnoi

PPT BIJNOR COUNTING Counting of Votes on ETPBs (FOR SERVICE ELECTORS

govindsharma81649

Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings

info695895

(NEHA) Call Girls Nagpur Call Now: 8250077686 Nagpur Escorts Booking Contact Details WhatsApp Chat: +91-8250077686 Nagpur Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus, they look fabulously elegant, making an impression. Independent Escorts Nagpur understands the value of confidentiality and discretion; they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide:

(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

Recently uploaded (20)

2024 Zoom Reinstein Legacy Asbestos Webinar

Financing strategies for adaptation. Presentation for CANCC

Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...

VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...

TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...

Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...

2024: The FAR, Federal Acquisition Regulations - Part 29

Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...

Government e Marketplace GeM Presentation

celebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hour

The NAP process & South-South peer learning

VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...

Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...

Item # 4 - 231 Encino Ave (Significance Only).pdf

Just Call Vip call girls Wardha Escorts ☎️8617370543 Starting From 5K to 25K ...

World Press Freedom Day 2024; May 3rd - Poster

Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking

PPT BIJNOR COUNTING Counting of Votes on ETPBs (FOR SERVICE ELECTORS

Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings

(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7

Drake Mendez curatecamp 2015

1. Maximizing Description to Enhance Access to Born- Digital Archival Collections Seeley G. Mudd Manuscript Library Princeton University Library Rossy Mendez, Public Services Project Archivist Jarrett M. Drake, Digital Archivist CURATEcamp, Brooklyn Historical Society April 23, 2015

3. “How we describe the collections in our care influences the ability of people to discover, access, use and interpret them” Trends in Practice: Archival Arrangement and Description, pg 17. <extent> <scopecontent> <unittitle>

4. The beginnings…

5. <extent> 1. Physical Space Quantity/Arrangement 2. Electronic Digital

7. <unittitle> Office of President Records, Shirley Tilghman Subgroup (AC379)

8. <scopecontent> Series Level “One third of the digital files are a mixture of PDF’s and Excel Spreadsheets”

9. <phystech>

10. <unitdate>

11. Multi-level Description of Digital Records Reality For born-digital records, the Archive’s existing descriptive workflows failed to provide sufficient context and precision for <did> elements, including <unitdate>, <unittitle>, & <extent>. Challenge For multi-level records, how does one create these elements programmatically?

12. Previous Workflow Create disk image

13. Previous Workflow CSV output from FTK Imager AT Resource RecordWindows Explorer EAD <did> element

14. Revised Workflow Question What are the key metadata points we should extract from born-digital records and later represent in EAD? Answer 1. Names of each folder  <unittitle> 2. Modified dates of the oldest and newest files  <unitdate> 3. Numbers of folders and numbers of files  <extent>

15. Current Description Workflow Complete digital records processing for Mudd Library can be found at: http://rbsc.princeton.edu/policies/guidance-recommended-file-formats .txt .csv .xls .xml

16. Current Description Workflow: Extract Shell script to extract <unittitle>, <unitdate>, and <extent> values -maxdepth 1

17. Current Description Workflow: Transform Output of shell script as .txt file Output of shell script transformed into EAD

18. Description Workflow Enhancements  Eliminate string values for <extent> elements and minimize post-processing of data  Use topic modeling for textual data (fondz or another program) and write scripts for basic textual analysis (e.g., automated page count for PDF’s)  Index all names of directories and files and represent their structure through a file browser embedded in the finding aid and/or the repository (Hydra)

Editor's Notes

Hello, I am Rossy Mendez and I am the Public Services Project Archivist at the Seeley G. Mudd Manuscript Library and my colleague is Jarrett Drake who is the Digital Archivist at Mudd and we are here to talk to you about our process in maximizing description to enhance access to born-digital archival collections.
First I wanted to talk a little bit about where we work. The Mudd Manuscript Library is part of the Rare Books and Special Collections Division at Princeton University. Our library houses and provide access to the university archives and public policy collections. We have over 30,000 linear feet of records of diverse media as well as several collections that contain born-digital material. One rather unique thing about Mudd is that there is not a hard-lined division between technical services and public services. With the exception of the records manager everyone on the team participates in reference duties by taking on a number of reference shifts that entail assisting on-site and remote patrons and paging as needed. The benefits to this approach is that the work of technical services is informed by how patrons use the collections and the resources used to find them.
Why be concerned with description? [quote] At Mudd the description in our finding aids is driven by three principles: A user should be able to quickly gather what and how much born digital content exists. A user should be able to know where the digital content lives within the finding aid and have easy access to that content. And last but not least a user should be able to understand the context in which these records were created. Because users approach records with different research questions and arrive from different information points, we strive to provide description at different levels.
One of the things that we instituted early was providing access to the born digital content through the finding aid. By clicking the view content button users where able to go to the file in our Webspace file system. The problem with this was that there was no distinction between digitized content and that which was born digital which was particularly problematic in collections that contained both types of content. Early attempts at description such as the <scopeandcontent> note in this finding aid provided minimal description but no specific information about the type of content or where it could be found. Other than this, neither the extent field or the series header indicated to the user that born digital content was included. Over the last year and a half at Mudd we made some significant changes to the description of born digital materials.
Perhaps the most significant change we have made to the description of born-digital records is including the amount of born digital materials. At first we thought of the <extent> field as the amount of physical space that the material occupied. But this information excluded a sense of depth and arrangement and therefore was not a good reflection of reality. Therefore, we decided to focus instead on using <extent> field to echo the quantity of materials and provide additional arrangement information. Another issue what the use of the word electronic vs the word digital. The change is a more accurate portrayal of the nature of the records since today we use mostly computers and not other electronic devices.
The <unittitle> field plays an important role in differentiating between digitized and born digital content. Without this designation in the user end there is no quick way to tell what material is digitized versus born digital because the access path is the same. Again we this we ultimately made the transition to use the word digital in the <unittitle>field which is used to describe series/subseries.
The <scopecontent> EAD tag which maps out to “Description” is perhaps the most beneficial to the patron because it lets the end user know right away that there is digital material included and secondly it allows for a listing of record formats contained within a digital series.
9
Recently the unitdate element has also undergone some revision so that it reflects more accurately the creation of these records. In his presentation Jarrett will address some of the work that is being done in this area. I will now turn it over to him so that he can explain some of our workflows and the practical components of these applications.
And so as Rossy showed, our description for born-digital records has been up and down, lots of downs And the problem, as you see stated here, is that our description lacked critical context and critical precision…that was just the reality The challenge [click] posed by this reality: how does one generate that context and precision programmatically? And by multi-level, I am drawing a distinction between flat digital records with no hierarchy, which you typically find in oral history collections or other types of communication or publication record types And meeting that challenge is something that our previous workflow wasn’t able to handle
Pictured here is our digital accessioning overview from 2012…this was a huge step forward from previous practice, and I’m thankful to my predecessors for their work In the fall of 2013 when I started, our digital archives workstation ran Windows, which I didn’t know I hated then but know now, and we used FTK Imager for disk imaging, Karen’s Directory Printer for directory printing, and Bagger for creating fixity information and AIP’s.
My first multi-level, complex digital collection was a set of records from the University’s first woman president, an accession that contained more than 20,000 digital files and roughly 75 top-level folders. To create a <unittitle>, I opened the FTK Imager csv output, sorted it alphabetically by full path, and cut/paste top-level folder paths into an AT resource record To create a <unitdate>, I eyeballed the Modified date’s earliest and latest 4 digit year information and manually typed it into an AT resource record To create an <extent>, I opened Windows Explorer, right-clicked on the Properties, and manually entered the file count and size directly into the exported EAD. I hopefully don’t have to explain to everyone here how problematic this was; it’s not that it took a terrible amount of time; given that I only did this for 75 folders, I probably had all of this information into AT after a couple of days BUT. Those things that we can do quickly in a manual fashion will not suffice when the orders of magnitude increase. More importantly, this way of generating descriptive elements said nothing of what materials lived below this level, and actually didn’t indicate that things lived below at all. So in many ways this description I did 18 months ago failed in both context and precision.
And so archivists at Mudd stepped back and said: we know that the relationships we wanted to represent already existed in the filesystem, so our next question became: how do we extract it directly, reliably, and without human intervention? In summer of 2014, we started using BitCurator on our FRED and ended our complicated relationship with Windows and Windows-related products. Between our digital initiative analyst, Rossy, and myself, we listed in plain English the types of questions we wanted to ask of our multi-level digital records: We said for each directory we wanted [click]: The name of the directory (not files!) The modified dates of the oldest and newest files The number of folders and files
With a clear idea of the metadata we needed to extract from born-digital records, I broke down the creation of the component-level <did> elements into four small steps: extract (bash), prepare (LibreOffice Calc), import (oXygen), and transform (oXygen). Outside the focus of this talk: you can see that I’ve written a similar step for creating <scopecontent> notes. You can find that complete workflow along with the rest of our digital records procedures linked at the bottom of this webpage, but for now I’ll explain and show images of our data extraction and transformation for the component-level <did> elements.
And so because we transitioned our workstation to BitCurator, we were now working in an Ubuntu OS environment, so we turned to the default shell in Linux, which is bash, to extract these data points that were already embedded in the filesystem and could be easily extracted without too much effort. We wrote a simple for loop in bash that stitched together different iterations of the find command , and it took many drafts to get this script to function the way we needed it to…Rossy can recall our frequent Thursday enough setbacks and near misses. Initially this script populated all folder titles…so, if the accession had 800 folders, you would feasibly have 800 multilevel components…but, again, given the depth of some accessions in University Archives and the fact that simply revealing the metadata of some files—such as a <unittitle> that read Discipline/Humanities/John Doe—would be an unlawful disclosure of sensitive and legally-protected information, we added the –maxdepth option on the loop to only grab <did> info for top-level folders. We can, and likely will, simply amend this part of the script depending on a collection’s need and access restriction.
After the script finishes running, we take this original text file and concatenate a few fields in OpenOffice, before we import the .xls into oXygen and transform that .xml into EAD with an XSLT stylesheet, after which we normalize the EAD in the same way that we normalize all of our finding aids. Even though we still currently have to put the raw text file through a series of transformations, we’ve been able to eliminate all rekeying and copying/pasting and produce a computer-generated description in a matter of seconds with very little manual intervention. Archivists do any folder name cleanup (i.e., expanding abbreviations) directly in the EAD. This computer-generated description is much richer in terms of its context and much more precise in its metadata [highlight the transition from simple 4 digit <unitdates> to ISO-formatted <unitdates>], allowing our archivists to assume intellectual control of born-digital records much more programmatically, reliably, and efficiently.
In ascending order of difficulty, I think these are the next steps for improving our descriptive practice and serving born-digital records to researchers more contextually and precisely.

Drake Mendez curatecamp 2015

Recommended

Recommended

More Related Content

Similar to Drake Mendez curatecamp 2015

Similar to Drake Mendez curatecamp 2015 (20)

Recently uploaded

Recently uploaded (20)

Drake Mendez curatecamp 2015

Editor's Notes