Your SlideShare is downloading. ×
0
Sailing the Digital Serial Seas:Charting a New Course withCONTENTdmEve GrünbergFrancesca FrancisState Library of North Car...
Mandate of the Library and advent of digitalpublicationsFinding a digital content (or asset) managementsystemDifferent typ...
The Library’s mandate: Manage and preserve statepublications, respectively, in all formats forpermanent public access and ...
2003 – 60%+ state government information borndigitalNeed a way to manage digital contentDigital Management Program (DIMP) ...
Trial period: October 2006-February 2007We bought and designed the ship(but are renting the dock…)The (pre-)maiden voyage
Readying the shipGuidelines:Digital Collection DevelopmentDigitization PrioritiesGeneral MetadataMetadata for SerialsPrese...
Connexion Digital Import (i.e. CDI) = MARC(Connexion)  Qualified Dublin Core (CONTENTdm)SerialsMultiple digital file uplo...
How received:PDF (from agencies or converted)Routine searchingHow inventoried:Digital databasePinPoint HashCINCH does both...
Start in MARCCDI feature – attach the digital objectCrosswalk from MARC to Qualified Dublin Core –digital object (w/ metad...
CDI feature in Connexion
Serial record (Project Client view)ParentChildren
Serial record (public view)ParentChildren
As compared to monographic record(public view)
Library digitizes many of its own publications(including serials)Identify publications  digitizedIn houseInternet Archive...
DIMP/Internet Archive workflowDIMP(Library)InternetArchiveDigitizationTitles &MARCrecordsAdditionaldata(Z39.50protocol)Gra...
Large serial files are treated as monographs (serialstructure not created)Digitized files typically largerSome documents a...
Serials as monographs (public view)
Serial structure (parent and children) also used forcollection level recordsCollections:Have collective research value, bu...
Collection level record
Anchoring serials during title changesDigital materials tied to recordsThrowing traditional title changes overboardNew rec...
Our experience and feedback has shown us that it isvery difficult for the patrons to see the relationshipsand understand t...
Serial title changes
Creating serial records is a multi-step and complexprocessThe index runs in the background…or does it?Approving large file...
Smoother-running approval and indexingAbility to handle secure and large files like other filesBetter search engine: the w...
http://ncgovdocs.org/Sail to our port!
Contact informationEve Grünberg, State DocumentsCatalogereve.grunberg@ncdcr.govFrancesca Francis, Assistant StateDocuments...
Questions? Comments?
Upcoming SlideShare
Loading in...5
×

Sailing the Digital Serial Seas: Charting a New Course with CONTENTdm

463

Published on

The State Library of North Carolina is legally mandated to facilitate public access to publications issued by State agencies and manage the depository system. With the increase of born digital documents and the demand for electronic access, the State Library needed to find a way to support the systematic collection, preservation, and access to state information in digital formats. Focusing on finding repository solutions for digital state publications and based on comparisons among leading products, the library found CONTENTdm to be the best overall fit. With the continuing need to create MARC records for digital documents, CONTENTdm offered functionality to create compound objects for single documents as well as structured serials, providing one permanent URL either way. Working with born digital and digitized serials still presents certain challenges with workflows, providing access, and compensating for the differences between born digital and digitized formats. This presentation discusses the ups and downs of managing digital serials in CONTENTdm, how we do it, and why we do it from the perspective of a mid-size state government library.

Francesca Francis
Assistant State Documents Cataloger, State Library of North Carolina
Raleigh, NC
I assist in the cataloging of original publications created by the state agencies of North Carolina, metadata/class schema/authority creation and management, and catalog problem-solving with a small side of reference desk work at the Government & Heritage Library. Prior to my time at the State Library, I worked part-time on a reference desk in the Cumberland County library system. While living in the DC area, I served as the catalog librarian for the U.S. Census Bureau and worked on a shelf list project with the U.S. GPO. I got my start in the library field when I was selected to work as the cataloging assistant at the law library of Catholic University while earning my MLS. As you may be able to guess, I kind of have a thing for cataloging and providing access to information, whether I'm on deck or in the control room...although I kind of have a penchant for playing the "[wo]man behind the curtain."

Eve Grunberg
Documents Cataloger, State Library of North Carolina
I have been working at the State Library of North Carolina as a documents cataloger since 2006. I am responsible of cataloging everything published by state agencies regardless of the format. Working with differnet publications has given me a great deal of knowledge and experience with MARC cataloging rules and standards, different classification schemas, authority work, Library of Congress and OCLC cataloging tools, metadata standards, and the creation of controlled vocabularies.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
463
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • List of topics covered by presentation
  • State government information is valuable and widely used by the citizens of North Carolina. The State Library of North Carolina is legally mandated by a General Statute to manage and preserve state publications, respectively, in all formats for permanent public access and maintain a permanent depository collection of all printed state documents. The State Library fulfills this responsibility through the North Carolina State Documents Depository System, established in 1987 by G.S. 125-11. The Depository System consists of the State Publications Clearinghouse, which is responsible for working with state agencies to identify publications, as well as collecting, processing, and distributing state agency publications, and Depository Libraries, which are responsible for providing public access to state agency publications.
  • The State Publications Clearinghousewasn’t structured and staffed at this time to accommodate born-digital information and support the systematic collection, preservation, and access to state information in digital formats. Users wanted electronic access to state agency publications.Depository librarians were unified in their desire to provide electronic access by having the State Library maintain a digital repository and distribute electronic publications by providing MARC catalog records with reference links to these publications. In 2006, the Digital Information Management Program (DIMP) was formed to focus on finding the best digital repository solution for digital state publications.Preference for Qualified Dublin Core schema and the Library’s lack of cataloging staff would require the Library to find an automated metadata cross-walking tool to streamline the cataloging of these publications.Based on their research, the DIMP had expectations that CONTENTdm would be a simple, inexpensive service for building digitized collections.Research had indicated that CONTENTdm would have an out-of-the-box public interface that allows for -- but does not require -- customization, thus minimizing the need for technical support. In addition, the team expected that CONTENTdm would allow for the storage of digital objects in its database without impact to retrieval performance, provide easily customizable metadata schemas, allow for metadata to be entered remotely, handle multi-part objects, allow for full-text search, and allow for the import/export of data. For preservation functionality and Qualified Dublin Core to MARC crosswalks, it seemed reasonable to use the tool Digital Archive from the same company.
  • After the trial period from October 2006 to February 2007, which was successful and met our expectations, the State Library subscribed to a hosted level license with CONTENTdm.
  • We developed our metadata guidelines and workflows, and started adding digital objects to the State Publications Collection.
  • OCLC Connexion Digital Import (CDI) feature allows us to start with a MARC bibliographic record in WorldCat and upload a file to our CONTENTdm collection, which creates a link in the WorldCat record to the file and crosswalks the MARC record to Qualified Dublin Core metadata in CONTENTdm. The crosswalk is controlled by OCLC, which means certain MARC fields are crosswalked into pre-designated QDC fields.Serials:Allows us to upload multiple digital files at the same time and create a single reference URL. Also, it creates a structure for the serial title, where all the “children” issues are under one “parent” title. When you search for the specific serial title and open the record, you can see full metadata under the title and accompanying issues listed on the sidebar of the title, which have their own metadata and full textTalk about initial appearance of serial structure
  • The Library receives born digital serials from state agencies via Dropbox and email in PDF format (or that which will be converted to PDF by us). We also perform routine searches on agency websites, searching for documents we don’t already have. The Library developed the Capture, Ingest, & Checksum Tool (CINCH), which is “designed to locate targeted files on the internet and download them in a preservation-ready state. This includes maintaining the files’ integrity by virus checking and repeated checksumming, as well as enhancing the files’ context with metadata extraction.”Once the document files are received, they are processed for cataloging, where they are checked into our database and the original file name and checksum (“thumbprint” for the specific file) are recorded. This metadata is found using PinPoint Hash software and/or CINCH, depending on how the files were received. The files are then renamed using file naming convention guidelines, which were created in house to keep a consistency for all related serial items for archival storage. Once these files are processed, they are moved to folders accessible by cataloging.
  • Since depository libraries have expressed the need for traditional MARC records as well as Dublin Core records, the cataloging process is begun by creating a MARC record in OCLC Connexion. Once the MARC record is created, the CDI feature is used to attach the digital objects to the MARC record (a minimum of two files are required to create the serial record structure) and crosswalk the MARC metadata into the qualified Dublin Core metadata fields in CONTENTdm, simultaneously dropping a record into CONTENTdm and creating a reference URL to the digital object from the MARC record. In the CONTENTdm Administration module, cataloging edits the metadata of the “parent” (main) record and approves it for indexing. After the initial index, the record is pulled in the CONTENTdm Project Client, where the serial structure can be viewed with “children” records (multiple attached items) branching from the parent.Here, the children records can be accessed and individually edited to reflect each issue’s unique metadata, the minimum requirements for which are determined by the Library’s metadata guidelines.The serial is then sent back into the approval queue, approved, and indexed once more, creating the final product.
  • Visual slide for CDI feature (Once the MARC record is created, the CDI feature is used to attach the digital objects to the MARC record (a minimum of two files are required to create the serial record structure) and crosswalk the MARC metadata into the qualified Dublin Core metadata fields in CONTENTdm, simultaneously dropping a record into CONTENTdm and creating a reference URL to the digital object from the MARC record)Talk about what happens when you have to wait for a second issue (if it is high importance – create monographic record and later change it to the serial record, if low importance drop to the waiting folder for outreach to get more issues)
  • This visual slide shows structured serial record in Project Client. Serial items order is oldest on the bottom, newest on the top
  • Once the serial structure is created, issues are added as they come to the Library. After the preparation process, Library technicians will add additional issues using the Project Client software by pulling up the parent record. Metadata is added using the guidelines, and the technicians send the issues individually to the approval queue. Once issues are examined by cataloging and approved, an index is run, and the parent record and any new issues are pulled into the Project Client together. Here, the items are attached to the parent record, edited to reflect their relationship to the title as above, and sent to the approval queue and final indexing.
  • Monographic compound structure (sort of similar to serial structure in display)
  • DIMP creates a list of titles and OCLC records for Internet Archive which corresponds to the materials we are sending. Internet Archive is able to grab additional information from our catalog using Z39.50 protocol. Once Internet Archive has digitized our publications, DIMP uses an Internet Archive download tool (created by Eastern Carolina University and developed by the Library) to grab the objects and metadata file. This and other metadata assessed by DIMP, such as the file paths and preservation metadata for each item, are used to create records for digitized items. The digital objects are pulled into the hard drive from Internet Archive and selected for processing in Project Client. All of the metadata – parent plus any available items – are pulled in through the Project Client as a text file. Once the serial structure is created, new digitized items can simply be pulled in through Project Client individually, as above, with new issues getting uploaded and attached. Oftentimes it will be that a long-running serial will have a mixture of digitized and born digital issues attached to the parent record, in effect documenting the change from analog to digital formats.(example: Symphony stories)
  • In some cases, digitized (and particularly large born digital) serial files must be treated as monographs structurally. Some state documents can be thousands of pages in length. Because digitized files tend to be larger than born digital documents in general, this can cause problems both on the back end and in the public view, where extra long load times and/or freezing will occur. These larger files are brought in as individual compound objects to prevent such problems. In this case, we make sure that some piece of issue-identifying information, such as year, is visible in the results when a search is conducted. Also, we generate a serial/series title to tie them together under one unified title on the title search. For that we added extra metadata field “Serial Title” for large file serials (for example: North Carolina Public Documents, North Carolina Session Laws, etc.)
  • Visual slide showing large digital files structured as monographic items in digital collection. Red circled metadata helps to connect those items. Special URL to collocate items are created.
  • Like the monographically-structured serial, not everything translates easily from the traditional paper library to a digital collection. Another type of “serial” record we create is for our collection level records. We opt for collection level records because we sometimes receive pamphlet/ephemeral type materials from agencies. These can be monographic or serial materials that share subject information to the extent that subject access can be adequately provided with one or more subject headings. They do not merit item-level or minimal-level cataloging, but collectively have a research value. We take advantage of the natural relationships that exist among the items within the collection and capture those relationships in one bibliographic record. In CONTENTdm, collection level records are treated as serial records with the standard serial structure (parent and children). We use the collective title for the parent, and individual titles are identified by issue.In this way, all items remain grouped together as a collection.
  • Visual slide shows example for previous slide – how collection level record is constructed in digital collection.
  • The digital presentation of materials as being tied to their records presents a particular challenge when it comes to handling a serial’s (nearly inevitable) title change(s). In traditional cataloging practices, every time a serial changes its title or a different body becomes responsible for its creation, a new record must be created. In the digital world, and through the use of metadata, we’ve tried to be more flexible and considerate of customers’ preferences. When a serial title change occurs and we know this is the same serial published by the same agency, we don’t create new digital records for it. We just continue to add issues to the same parent record regardless of the title change. The record we choose to add issues to is ideally the first title of that serial; however, we tend to acquire newer issues first, and therefore initially create a record for and attach all items to what would be considered a successive record.We don’t completely abandon convention though, as we do create a new MARC record for each title change following traditional cataloging rules and practices. We then link this new MARC record to the metadata record containing all issues in our digital collection. We also add the new OCLC number to the parent record and use “Other Title”, “Title Replaced by” and “Title Replaces” fields to record title changes in this record.
  • How a serial title change is treated – all titles on one record with additional title data
  • Creating serial records is a complex and multiple step process. To complete a record we have to approve the title, index, edit the parent and children, approve, and index again. This doesn’t sound like a major problem until you consider the fact that the index, which is supposed to run in the background of our collection and allow us to continue working unbothered, actually prevents us from approving and making other changes to the collection. This, in turn, holds us back from producing records more quickly and plentifully. We also tend to get locked out of approving records in the administrative module when one person is simultaneously approving a large file or conducting a mass approval – another function that should be running behind the scenes. We are also facing difficulty with replacing and/or deleting single serial issues out of a record. On occasion we have to replace an existing serial item with the new one (broken file, etc.). What should be the easiest way – simply deleting the item out of the serial structure – may cause the entire record to come apart. This is because our serial structure (parent and children) is created by attaching two serial issues to the record using CDI and crosswalking it to the digital collection. When we need to delete one child record from the parent record, we need to be sure that the child is not one of the original children that were used to create the initial serial structure. If it is, the serial structure will fall apart and single issues of the serial will float around in the collection like “lost children”.Sometimes a publication that appears to be a monographic item becomes serial instead. This happens a lot with state publications, as many different types of reports are issued – for example, a report which was published once may be published again the next year or after two years. When a title acquires this sort of frequency, we need to turn the single item into a serial, if nothing else so that our patrons would be able to find them easily. We recatalog the monographic record as or derive a new serial record in MARC, reacquire the older file from digital storage, and create a serial structure for this title in the digital collection. This is the sort of flexibility that is necessary as the collection grows and changes.
  • Over the course of adopting and adapting to CONTENTdm the Library has had its share of positive and negative experiences, from which many lessons have been learned. We have also used this opportunity to take our collective knowledge and experience to look to the future – especially now that we have a better idea of what it is we are looking for (and not looking for) in a collection management system.Aside from smoother-running approval and indexing processes, other items on our wish list include the ability to work better with locked/secure files. Some agencies are not comfortable providing their documents without some security feature on the PDFs. Problems range from the inability to create a thumbnail or pull full text to issues with creating a compound object. Part of this concern can be addressed by working with these agencies to educate them on how the Library handles their documents; but we are also hoping that digital content management technology will become sophisticated enough to deal with secure files.Perhaps one of the more basic desires is to be able to perform a search that results in a completely alphabetical list of publications. In the beginning stages of the collection, this was possible; however, several upgrades later, the function appears to be somewhat broken – seemingly alphabetized search results with non-alphabetized results mixed in between. If it makes finding documents difficult for us, the issue is most likely exponentially frustrating for the end users, rendering the function somewhat useless. The biggest issue for us has been the concept of an “unlimited” collection. Initially we were told that our collections could be unlimited in size, but have found this to only be partially true. After developing our state publications collection as a single collection – including some fairly large documents, such as Session Laws, Public Documents, House and Senate Journals, etc. – the branches managing this collection started running into issues in various parts of the process. It was only recently that we learned that a single collection does indeed have a limit, which is quantified in number of pages. The solution seems to be to split this particular collection into several smaller collections, taking into account leaving room for currently running serials and other elements that might expand pieces of the collection. We are also concerned about the collection running seamlessly as a whole, especially on the public side.
  • Link to our Digital State Publication Collection
  • Transcript of "Sailing the Digital Serial Seas: Charting a New Course with CONTENTdm"

    1. 1. Sailing the Digital Serial Seas:Charting a New Course withCONTENTdmEve GrünbergFrancesca FrancisState Library of North CarolinaNASIG 2013
    2. 2. Mandate of the Library and advent of digitalpublicationsFinding a digital content (or asset) managementsystemDifferent types of digital serials and how we workwith themChallenges and expectations“Sailing” aheadThe Manifest
    3. 3. The Library’s mandate: Manage and preserve statepublications, respectively, in all formats forpermanent public access and maintain a permanentdepository collection of all printed state documentsNorth Carolina State Documents Depository System(est. 1987) = Clearinghouse + depository librariesIdentify, collect, process, distribute, provide accessBackground
    4. 4. 2003 – 60%+ state government information borndigitalNeed a way to manage digital contentDigital Management Program (DIMP) study CONTENTdmQualified Dublin CoreAutomated metadata crosswalkingCustomizationDigital Archive for preservationDigital pubs, ho!
    5. 5. Trial period: October 2006-February 2007We bought and designed the ship(but are renting the dock…)The (pre-)maiden voyage
    6. 6. Readying the shipGuidelines:Digital Collection DevelopmentDigitization PrioritiesGeneral MetadataMetadata for SerialsPreservation MetadataPreservation and File Formathttp://digital.ncdcr.gov/cdm/about
    7. 7. Connexion Digital Import (i.e. CDI) = MARC(Connexion)  Qualified Dublin Core (CONTENTdm)SerialsMultiple digital file uploadSingle reference URLStructure (“parent” & “children”)Handling the cargo
    8. 8. How received:PDF (from agencies or converted)Routine searchingHow inventoried:Digital databasePinPoint HashCINCH does both!http://cinch.nclive.org/Cinch/CINCHdocumentation.pdfFinal prep:Renaming (using guidelines; unique, for preservation)Move to catalogingCargo type #1: Born digital
    9. 9. Start in MARCCDI feature – attach the digital objectCrosswalk from MARC to Qualified Dublin Core –digital object (w/ metadata) “drops” intoCONTENTdmSerials editingEdit metadata of parent  approve  indexEdit metadata of children  approve  indexTransporting from Connexion toCONTENTdm
    10. 10. CDI feature in Connexion
    11. 11. Serial record (Project Client view)ParentChildren
    12. 12. Serial record (public view)ParentChildren
    13. 13. As compared to monographic record(public view)
    14. 14. Library digitizes many of its own publications(including serials)Identify publications  digitizedIn houseInternet ArchiveCargo type #2: Digitized serials
    15. 15. DIMP/Internet Archive workflowDIMP(Library)InternetArchiveDigitizationTitles &MARCrecordsAdditionaldata(Z39.50protocol)Grabmetadata& objectsLoad object withmetadata thru ProjectClientSerial record
    16. 16. Large serial files are treated as monographs (serialstructure not created)Digitized files typically largerSome documents are thousands of pages longLoading times are too long, can freezeSolution:Load as single items (individual compound objects w/ fullmetadata)Add extra metadata field (serial title) to collocateIssue identifying information (i.e. year) added to titleSpecial link to search results of all issues in serial titleOversized cargo
    17. 17. Serials as monographs (public view)
    18. 18. Serial structure (parent and children) also used forcollection level recordsCollections:Have collective research value, but may not be worthcataloging alone (i.e. ephemeral)Share subject and/or agency information, other naturalrelationshipsMay or may not have a collective titleMiscellaneous cargo
    19. 19. Collection level record
    20. 20. Anchoring serials during title changesDigital materials tied to recordsThrowing traditional title changes overboardNew records for title changes as usual in MARCSingle record approach in digital collectionAdd all OCLC numbers, all titles (in other title field) on allrecords associatedSame link on all MARC records associatedExceptions: major changes to serial (i.e. titlemerged/separated, agency shift, content/focus)
    21. 21. Our experience and feedback has shown us that it isvery difficult for the patrons to see the relationshipsand understand the records if we use a multiplerecords approach to show the serial title changesWhen you think about your physical collection, theserials sit together seamlessly on the shelf regardlessof the title change; and so by using a one-recordapproach, digital serials can “sit together” in thedigital collectionWHY???
    22. 22. Serial title changes
    23. 23. Creating serial records is a multi-step and complexprocessThe index runs in the background…or does it?Approving large files (the sea monsters of thecollection)Coordinating workflows with othersDeleting/replacing serial issues, or: whoops, we brokethe structureTurning monographs into serialsRiding the waves: Challenges withCONTENTdm
    24. 24. Smoother-running approval and indexingAbility to handle secure and large files like other filesBetter search engine: the white whaleRelevant resultsAlphabetical/chronological orderHow much cargo can this ship hold? Finding the limitsof an “unlimited” collection (and patching the leaks)Sailing ahead: Our wish list
    25. 25. http://ncgovdocs.org/Sail to our port!
    26. 26. Contact informationEve Grünberg, State DocumentsCatalogereve.grunberg@ncdcr.govFrancesca Francis, Assistant StateDocuments Catalogerfrancesca.francis@ncdcr.gov
    27. 27. Questions? Comments?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×