As part of the session on “Description, Metadata & Preservation”, Lisa will talk about the Claremont Colleges Digital Library (CCDL), its digital asset management system, CONTENTdm, and the CCDL’s experience in working with archival materials and projects. Lisa’s presentation is full of examples of how a variety of archival materials can be presented and described using Dublin Core and CONTENTdm. Examples include map collections with varying provenances, photo essays, 35mm slides, and scrapbooks.
2. Overview
• The CCDL
• CONTENTdm
• Archival Projects
– Maps
– Photo essays
– 35mm slides
– Scrapbooks
• Things to consider
• References
3. Claremont Colleges Digital Library
• Fall 2003
• Spring 2006
• 9 original collections
• 85 collections currently
• 65 collections published
• 128,240 items
4. DAM Systems
• 13 software platforms
– http://ccdl.libraries.claremont.edu/cdm/ref/collec
tion/adl/id/8
• No open source software
• Went with CONTENTdm
15. Organization and Structure
of Digital Object
• What do we serve up in
CONTENTdm?
–The entire scrapbook
–The scrapbook page
–The item on a
scrapbook
page
18. Individual Items
• Advantages
– Search-ability
– Finer focus
– Continuity
– Readability
• Disadvantages
– Loss of organization/structure
– Many objects
19. Lindley Scrapbook Collection
• Items
• Two new fields
– Series
– Scrapbook name
• Source field
– Scrapbook name
– Page number
– Box storage number
• Downloadable PDFs?
25. Scrapbooks Summary
• Lindley
– Provenance maintained; original order not; but
offered through metadata and PDFs
• Wheeler
– Provenance maintained; original order maintained
• Williams
– provenance maintained; original order maintained
26. Things to Consider
• Interoperability of metadata
• Provenance and original order
• Customize as needed
• Don’t edit through omission
• Organization and structure
• Capitalize on your software features
• Object file names
27. Object File Names
• Backbone of collection
• Keep simple
• aaa#####
• aaa#####_####
• Easy to check
28. References
• Claremont Colleges Digital Library
– http://ccdl.libraries.claremont.edu
• Growing a Digital Library
– http://ccdl.libraries.claremont.edu/col/adl
– Strategic plans, operating plans, background research, task
force reports, work groups, user surveys, etc.
• Getty Institute, Metadata Standards Crosswalk
– http://www.getty.edu/research/publications/electronic_
publications/intrometadata/crosswalks.html
29. References
• Maps and Mapping at The
Claremont Colleges
– http://ccdl.libraries.claremont.edu/cdm/
landingpage/collection/p15831coll14
• Elisa Leonelli, Photojournalist
– http://ccdl.libraries.claremont.edu/cdm/landingpage/
collection/p15831coll13
• Larry Oglesby Collection
– http://ccdl.libraries.claremont.edu/cdm/landingpage/
collection/loc
30. References
• Dr. Walter Lindley Scrapbooks
– http://ccdl.libraries.claremont.edu/
cdm/landingpage/collection/lsc
• Edward Ellerker Williams Notebook
– http://ccdl.libraries.claremont.edu/cdm/landingpage/
collection/joe
• Wheeler Scrapbook Collection
– http://ccdl.libraries.claremont.edu/cdm/landingpage/
collection/wsc
31. Lisa Crane,
Western Americana Manuscripts Librarian
Special Collections, Claremont Colleges Library
Claremont University Consortium
800 N. Dartmouth Avenue
Claremont, CA 91711
Phone: 909-607-0862
Email: Lisa_Crane@cuc.claremont.edu
Contact Information
Editor's Notes
Throughout this presentation I will use the acronym CCDL rather than say its full name, The Claremont Colleges Digital Library. I have a lot of information to cover and I’m not sure I will have the time so I might skip over some of the slides. But don’t worry! I plan to include this slide presentation in our “Growing a Digital Library” collection so you can take a look at it at your leisure.
During the next 20 minutes, or so, I will provide a quick background of the CCDL and talk a little bit about CONTENTdm
Next I will highlight some of the methods we have used to convey archival projects through CONTENTdm.
In summary, I will point out some things to consider.
And lastly, I will provide some resources you might find helpful as you work on your projects.
The idea of a digital library housed within the Claremont Colleges Library was born in Fall 2003.
Two and a half years later, the CCDL went live on CONTENTdm with 9 original collections. Special Collections and archives was, and still is, the largest contributor of content.
Currently, the CCDL has 85 collections, of which 65 are published. The digital library holds 128,240 items comprised of almost 16,000 compound objects, over 100,000 photographs, almost 7,900 document or PDF files, and 2,650 audio and/or visual files.
A task force in 2005 reviewed 13 software platforms for the CCDL such as Arkemedia Digital Asset Management by Harris Corporation, Berkeley
Electronic Press (BePress), CONTENTdm, DB2 Content Manager by IBM, Digital Asset Management by Artesia Digital Media a division of OpenText, DigiTool by ExLibris, Embark by Gallery Systems, ImageFolio, Image Portal Family by NetXposure, Insight by Luna Imaging, Iron Point, MetaSource Digital Collections Management by Innovative Interfaces, and TeleScope Application Platform by North Plains.
They did not look at any open source software such as D-Space, Fedora or Greenspan.
They decided on CONTENTdm which was created by DIMEMA Corporation. Since the decision was made 2 years before my arrival at the Claremont Colleges Library, I’m not sure the reasoning behind their decision to go with CONTENTdm and I couldn’t find any documentation regarding the decision.
I do know that CONTENTdm supports a lot of file types allowing for the digitization of a variety of materials. This is just a snapshot of the various material types you can find in the CCDL.
Having had no experience with CONTENTdm prior to my arrival at the Claremont Colleges Library, I found the software quick to learn. Though it was lacking in certain areas which necessitated our library IT staff to build in some functionality such as indexes on the collection homepage; the ability for non-programming staff to configure the indexes, collection homepages and collection lists and allow for patrons to make annotations to individual items. Much of this functionality has been incorporated into the core programming of current versions of CONTENTdm.
Another benefit of using CONTENTdm is that it uses Dublin Core as its basic metadata scheme.
Here, you can see the field properties for one of our collections. The first column shows the labels for the metadata fields – which we appropriately adjust for each collection. The second column shows the Dublin Core metadata field to which the labels are mapped.
Even though Dublin Core is the underlying metadata scheme, CONTENTdm is flexible enough to allow us to change the labels for specific fields as needed to ensure our users understand the information that is provided.
For example, the Dublin Core “creator” field can be renamed to “artist” for art collections or “author” for textual collections. And since no one really knows what “coverage – spatial” means, we commonly relabel this to “location”.
Another advantage of having Dublin Core as the basic metadata schema is that it is highly interoperable with other metadata schema as shown by this table which was taken from the Getty electronic publications website. I know it is hard to see but it shows how VRA Core, MARC/AACR and DACS all work with Dublin Core.
Is everyone familiar with all of those acronyms?
VRA Core – is a data standard for the description of works of visual culture as well as the images that document them.
MARC – (Machine-Readable Cataloging) is a set of codes and content designators defined for encoding machine-readable records.
AACR - (Anglo-American Cataloguing Rules) is a standardized way to describe an item for categorizing and cataloging purposes.
DACS – (Describing Archives: A Content Standard) a set of rules for describing archives, personal papers, and manuscript collections.
Anytime you work with archival materials – two fundamental principles of archives must be considered – provenance and original order.
Provenance refers to the individual, family, or organization that created or received the items in a collection. The principle of provenance or the respect des fonds dictates that records of different origins (provenance) be kept separate to preserve their context.
Maintaining records in original order serves two purposes. First, it preserves existing relationships and evidential significance that can be inferred from the context of the records. Second, it exploits the record creator's mechanisms to access the records, saving the archives the work of creating new access tools.
I will be referring to these two principles as I talk about the design and implementation of a couple of our archival projects in CONTENTdm.
So what happens if you have a bunch of materials that are all the same format – but have different provenances? We have some rather large map collections. Not wanting to silo each archival collection into its own digital collection, we thought to combine the collections virtually. Through the use of metadata and the collection indexes, we could easily ensure the provenance of each map was not lost. This flexibility allows us to grow the collection as new archival collections and/or items are accessioned.
The same logic used for multiple archival collections within a single digital collection can also be applied at the series level for both archival and digital collections. The Elisa Leonelli Photojournalist collection is made up of photo essays – images for the photographs and PDFs for the essays which are meant to be viewed together. We created a metadata field called “series” that allows us to tag all items within that series and then display the series information as an index on the collection homepage.
Should users skip the collection indexes and access the collection through browsing or other search behaviors, there are notes on all items referencing the appropriate essay and photographs. Above, users are encouraged to click on the “series” field to see all photographs related to this particular photo essay. Below, users are encouraged to click on the link in the “notes” field to see the related essay.
The Larry Oglesby collection includes 8,000 35mm slides which represent Professor Oglesby’s teaching materials as well as documentation of student field trips. As we designed the digital collection for these slides we wanted to be sure we didn’t “edit” any content and deprive the user of Professor Oglesby’s meticulous notations regarding scientific name, family, common name and locations of the flora and fauna depicted in his slides. So we decided to make use of CONTENTdm’s compound object feature and scanned both the slide and the slide mount.
Not only does this method allow users to see both the slide and the slide mount, but it also gives users an opportunity to make annotations regarding any errors which may have occurred as the result of misreading Oglesby’s handwriting!
It is not uncommon to find scrapbooks among archival collections. You might find a single scrapbook, photo album, journal or notebook among a collection of papers or you might have an entire collection made up of nothing but scrapbooks. Regardless, scrapbooks – and their related kin – pose an interesting challenge in the digital environment because they are made up of so many parts. There is the whole collection – such as our Dr. Walter Lindley Collection and the individual item such as our Edward Ellerker Williams Notebook. The scrapbook itself scan be examined in terms of its whole package (the scrapbook), its pages, and the items on a page.
So it really comes down to the organization and structure of the digital object.
Scrapbooks pose a challenge regarding the organization and structure of the digital object. We had to take into consideration if we would serve out the entire scrapbook as a compound object; the scrapbook page as a stand alone item or as a compound object; or the items which were adhered to the scrapbook page. All the while we had to keep in mind provenance and original order.
The next three slides highlight the advantages and the disadvantages of serving out each type: the entire scrapbook…
The scrapbook page…
Or the individual items …
With the Lindley collection, we decided to serve out the individual items on the scrapbook pages. When you browse the collection, you will see the clippings and other items affixed to the scrapbook pages. The collection defaults to sort on the title field. In order to allow users to pull together items from a specific series or scrapbook, we created two additional fields and provided an index for these fields on the collection homepage. We enhanced the source field with the scrapbook name, the page number within the scrapbook where the item could be located and the archival container number housing the scrapbook. To ensure users are able to experience the scrapbooks as they appear in their analog format, we will offer downloadable PDFs for each scrapbook – where each page of the PDF will represent each page of the scrapbook.
Here you can see the collection’s indexes as they appear on the collection homepage (lower left corner). If a user clicks on the Series index, a list of all series in the collection will appear. A simple click on a particular series will pull together all items of that series in the results page. The same goes for the scrapbook name index.
When browsing the Wheeler collection, you will see the pages. The collection default is sorted on the object file name field – rather than the title field. This prevents page 1 from being followed by page 10 – as would have happened if we had sorted the collection on the title field. Keeping in line with “original order” – it was important to be sure the pages appeared in their proper order. Clicking on any page will allow the user to see thumbnails of all the items affixed to that page.
Once a scrapbook page is clicked, the page appears in the item view and thumbnails of all items affixed to the page appear on the right. By scrolling down through the thumbnails on the right, we can click on any item we wish to see in greater detail. Clicking on any item will bring that item into the viewer and provide the metadata for that item.
Through CONTENTdm’s compound object feature – we are able to describe both the scrapbook page and all of the items on the page without loosing the relationship between the page and the items
With the Edward Ellerker Williams notebook – we decided to create a single collection that would be comprised of individual images of each page. The collection defaults to sort on the object file name – again to ensure the pages appear in their proper order without textual interpretation of the page numbers. I’m not quite sure why the results page is showing multiple images of the same item (circled). After checking the data within the collection, I was able to determine duplicates were not the problem – looks like something we need to check with OCLC about!
The problem with this method is that it creates a lot of silos – individual collections containing a single item. With today’s search engines this isn’t the problem that it was several years ago – but it does seem to be a bit self-indulgent and lacking economies of scale.
So why the difference in methods for these scrapbook collections? With Lindley, I had the opportunity to build this collection from the ground up – including how the materials were digitized. I wanted a collection that would be diverse and engaging in its appearance and easily searchable. And at the time, CONTENTdm’s compound object feature was a bit clunky. With Wheeler, I inherited a legacy collection. The scrapbooks had already been digitized and made available in non-dynamic webpages. I just needed to migrate them into CONTENTdm. And Williams was just an experiment, really.
With CONTENTdm’s Dublin Core metadata schema, interoperability of metadata is not a problem through the use of crosswalks to a variety of other metadata schemas.
When working with archival collections – you must not forget these two fundamental principles of archives – provenance and original order.
Customize the metadata labels as needed. Customize the digital collection as needed. Not all collections have to be built/configured in the same way. Take into consideration the source material format.
What information are you trying to convey from the source materials? Don’t “edit” by leaving out certain parts or being too selective in what is digitized.
Think about the organization and structure of the source material and how that can be communicated through digitization and metadata. Think how the digitized object needs to be organized and structured.
Make use of the features of your software – change field labels, create indexes, build compound objects, utilize additional linking, change sorting defaults.
Finally, think about object file names
Object file names are the names we give to our digital files. I have seen some real convoluted file names where the organization is trying to convey too much information through a file name. We like to keep ours simple. Three alpha characters at the start of each file name indicate the digital collection alias – usually the first letter from the first three words in the collection’s title. The next five numeric characters are sequential numbers starting with 00001 on up through 99,999. If you have a compound object – or an item that is more than a single scan, we add an underscore the then four more numeric characters – beginning with 0001 on up through 9,999. For example, a letter might have 2 pages, so the object file name for the scans of those two pages would have the same root, but appended would be 0001 for page 1 and 0002 for page 2. By keeping the object file name simple, it is easy assign when scanning, no need for additional definitions. It is also easy to check; to see if you missed a number.
The next 3 slides list a bunch of references and sources of information – most are collections and items within the CCDL.
These are the links to the collections I talked about this afternoon.
And here’s my contact information!
Please don’t hesitate to contact me if you have any questions about this presentation, the CCDL, CONTENTdm or digital projects in general.