Good afternoon! My name is Krystal Thomas, I am the Digital Library Coordinator and Archivist for the Theodore Roosevelt Center at Dickinson State University. Today, I’d like to talk to you specifically about our digital library project, the challenges, some of the more technical aspects and our unique cataloging model. I want to start off by introducing myself a little. I am a new North Dakotan, I just moved to Dickinson in January from upstate New York. I have my masters in Archives and Records Management from the University of Michigan School of Information so I am approaching this collection more as an archivist, not as a librarian.
One of the first questions I often get asked about the project is why it is located in Dickinson, North Dakota. There are several reasons why DSU is the right place for the project. One, is TR’s connection to North Dakota, particularly the Badlands where he spent time hunting and ranching. As I am sure you have all heard, TR himself once said he would never have become president if it weren’t for the time he spent in North Dakota. So, it is fitting that the project be situated on the edge of the Badlands, a place TR considered a second home.In 1958, there was a national celebration for the centennial of Roosevelt’s birth. North Dakota’s celebration was second only to New York, TR’s birthplace, in their festivities. DSU was instrumental in coordinating North Dakota’s activities, particularly a series of lectures during that year, including one by then-Senator John F. Kennedy. This was the “first” Roosevelt Symposium at DSU. In 2000, DSU cooperated with Theodore Roosevelt National Park to host a symposium on Roosevelt as a Western Writer. This symposium was carried live by C-SPAN.In 2005, Clay Jenkinson returned to North Dakota and became involved with the TR initiative at DSU. In conjunction with other stakeholder organizations in the region, including Theodore Roosevelt National Park, the Theodore Roosevelt Medora Foundation, the North Dakota Cowboy Hall of Fame, and the State Historical Society, DSU hosted the TR Symposium, “TR: The Adventurer in the Arena” in October 2006. With participants from 28 states, this popular event grew into a major annual event at the university. It was also in 2006 that Clay conceived the idea of a digital presidential library. While the Roosevelt papers are scattered physically, the digital world allows us to collect them all in one place on the Web. In September 2007, the TRC officially opened in DSU’s Stoxen Library.
Roosevelt predates the Presidential Libraries Act; Hoover is the first president to have an official presidential library. Because of that, Roosevelt’s papers are scattered throughout the country in different repositories and collections. So, when Clay proposed the digital library idea, the TRC created partnerships with the major repositories and sites associated with Roosevelt. We have digitization partnerships with the Library of Congress, who owns the largest collection of Roosevelt materials, six National Park sites: Theodore Roosevelt National Park in Medora, ND; Theodore Roosevelt Memorial Island in Washington DC; Sagamore Hill National Historic Site in Oyster Bay, NY; Theodore Roosevelt Inaugural Site, Buffalo NY; Theodore Roosevelt Birthplace, New York City; Mount Rushmore, South Dakota. We also recently signed a digitization agreement with Harvard University which holds the second largest collection of Roosevelt materials.
As you could tell by that list of partners, our collections are extremely scattered and that is a major challenge of this project: The location of the materials as regards the Center and the challenge of coordinating the projects and making sure our partners have the information they need from us to do their work efficiently and effectively. Another major challenge is the volume of the materials we are working to make accessible. Roosevelt was a prolific writer who sent and received thousands of letters during his lifetime, not only while he was president. So there are millions of items to collect. Because these are archival collections, they are not (for the most part) item cataloged; but for the digital library to work best, we need to have minimal discovery metadata generated for every single item in the digital library. This is time consuming and leaves a lot of room for inconsistency and error. Copyright is a challenge with the collections as well. Though Roosevelt and many of his contemporaries are out of copyright, the collection does include newspaper clippings and other published materials that are cause for concern. We are currently exploring how best to approach copyright in terms of what risks we are willing to take and what to make public.
(Note to self: Using the LC collection as example to work with through the rest of this presentation)The Library of Congress’s manuscript collection is a special challenge to us. While most of the collections coming from our other partners will come to us with some cataloging done, the Library of Congress collection, roughly 250,000 documents or over 600,000 digital images, came to us uncataloged, with only a bare-bones finding aid to start from. That finding aid had only names and dates for items; items duplicated throughout the finding aid and other items simply not listed. This collection exemplifies all of our project challenges and is the collection we work most directly with at the TRC on a daily basis. It drives most of our decision making processes when it comes to cataloging decisions.
Why was DC chosen? Our reasons were the standard reasons most digital libraries decide to use DC: standard for many or even most digital library projects, flexible, repeatable fields, easy to use for both staff and volunteers, easy to share with other institutions through harvestingDC was also designed to address concerns from both the library and archival communities as they went digital – it combines cataloging requirements from both disciplines.
There was no controlled vocabulary in place when I started in January and catalogers were just using keywords in the subject field. I was hesitant to keep doing that, as with many catalogers, I wanted them to at least be pulling from a common place to introduce some sort of consistency to the collection’s cataloging and make discoverability less problematic down the road. I was going to use LCSH but then I spoke with Shelby Harkin, at UND and she suggested FAST to me. It was still in its development phase but had been there for a long time and Shelby was confident that it would be out and usable soon. So, I explored FAST (Faceted Application of Subject Terminology) and found it to be a good fit for our collection. Our catalogers seem to find it fairly easy to use the more they work with it and I especially liked it for its “keyword” style – a user looking at these subject headings will not be intimidated or confused. FAST looks more like the tags users have become used to seeing because of Flickr, facebook and blogsBecause FAST was still in development, we decided to use LC for the Name Authority files we needed for the collection. I have found FAST and LCNAF work well in tandem for our purposes. I wanted to make sure the collection had the best possible resource for personal and corporate names available. We still run into names not found outside of the collection – average citizens who wrote to the president so we organize those names in the same format as LCNAF.
Criteria for the CMS: Ability to capture rich metadata Ease of metadata capture (back-end user interface) No need for “client” software downloaded on user machine for metadata entry (ContentDM has a large client) – needed to be web-based Ability to integrate into dynamic front-end Ease of administration (metadata fields, user permissions, etc.) Product support Cost DARMA, digital asset and rights management (MA is taken from management), is a very straightforward system. It allows us to manage all of our collections and their metadata. We also can track workflow processes through DARMA which allows us to monitor what is and is not cataloged. It is within DARMA that we link the images on their server to their metadata records. Eventually, there will be a more sophisticated workflow mechanism built into DARMA and also, controlled vocabulary will be built in so that DARMA is talking to FAST directly, allowing cross references to work in the search internally. I will now show you around DARMA so that you can see our system and how someone comes in and catalogs an item. DARMA: Reel 35 – Page 3 was open; Reel 36 is complete so use as needed
Another tool we are using for our catalogers, and for ourselves, is the project management software Basecamp – completely based on-line so our catalogers can access this as easily as they do DARMA or FASTWe’re slowly but surely convincing our catalogers of the value of Basecamp as a place to share their knowledge and to ask questions where everyone can benefit from the answers. We also have “Writeboards” here where both our staff and volunteers keep a running list of the most used LCNAF and FAST entries, making the collection’s subject cataloging even more consistent. Demonstrate – stick in volunteer platform
Because the LC collection is uncataloged and too much for us to undertake alone, the TRC developed a unique volunteer experience for those in the Dickinson community and beyond. We train volunteers in a six week class how to catalog the collection and then let them loose to catalog! It’s a small program yet but our list of volunteers keeps growing. Our volunteers come to the work from many different backgrounds but all share a love of history and an interest in Theodore Roosevelt. One volunteer noted: “I am learning so much more about not only TR, but the entire Progressive Era. I am addicted!” Another volunteer said “It is pretty cool to get to snoop into his personal correspondence and other documents.”Because our project is web-based, we can have volunteers all over the country helping to prep the collection for better digital access and we have. Right now, we have volunteers in Pennsylvania, Louisiana, New Mexico and California as well as throughout the state of North Dakota. Our goal for our DL launch is to have the entire first year of Roosevelt’s presidency, 1901-1902, cataloged. With the help of our citizen catalogers, we are getting close to reaching our goal. But we are always looking for more, so if anyone here is interested, please come and talk to me after the presentation.
I want to thank you for letting me come and speak with you today about our exciting digital library project. I will take any questions you may have now.
Cataloging A President
Presentation to the North Dakota Library Association<br />Krystal Thomas<br />Theodore Roosevelt Center at Dickinson State University<br />September 30, 2010<br />Cataloging A President<br />
Why Dickinson State?<br />Theodore Roosevelt’s connection to the Badlands<br />1958 Centennial Symposium<br />Clay Jenkinson, Theodore Roosevelt Humanities Scholar<br />
TR predates the Presidential Libraries so his papers have scattered<br />Jenkinson proposes creating a digital presidential library<br />Contributing partners<br />Library of Congress<br />Six National Park Service sites<br />Harvard University<br />Theodore Roosevelt Digital Library<br />
Project Challenges<br />The scattered nature of the materials<br />The volume of materials<br />Coordinating the many components of the project scattered across the country<br />Discoverability <br />Copyright<br />
Library of Congress<br />Unique collection among our partners<br />Uncataloged when given to us; only a very basic finding aid<br />Names and dates only <br />Many items not even mentioned in finding aid<br />
Standard for digital library projects<br />Appropriate mix of library and archival standards in one schema<br />Repeatable fields allowing us to “customize” to fit our needs<br />Approachable for an untrained cataloger<br />Easy to harvest and include in aggregated on-line catalogs<br />Metadata Scheme – Dublin Core<br />
Controlled Vocabularies<br />FAST<br />Recommended for digital collections<br />Derived from Library of Congress Subject Headings<br />Ease of use for untrained catalogers<br />Ease of use for audiences<br />LCNAF<br />FAST still in development; works well in tandem<br />Wanted the best authority possible for personal and corporate names in the collection<br />
Our content management system, Digital Asset and Rights MAanagement<br />Manages collections and workflow as well as our linking process with images<br />Will eventually have a controlled vocabulary mechanism built in<br />Demonstration<br />DARMA<br />
Basecamp<br />Project Management software<br />Coordination of projects on a staff and volunteer platform<br />Tracking progress and decisions<br />A place for knowledge gathering <br />Demonstration<br />
Citizen Catalogers<br />For the Library of Congress collection, we knew <br />we needed help <br />and we wanted <br />to tap into <br />the enthusiasm <br />in the community <br />for the digital library project.<br />