Your SlideShare is downloading. ×

ALA Interoperability


Published on

For use on a contentDM UI project. For custom collections management design contact us at

For use on a contentDM UI project. For custom collections management design contact us at

Published in: Technology, Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Transcript

    • 1. CONTENTdm Interoperability -- Leveraging resources; repurposing collections ALA Annual New Orleans, LA June 23 rd , Friday, 9 am to noon Claire Cocco , Product Manager Geri Ingram , Customer Service Specialist DiMeMa, Inc.
    • 2. Agenda Part 1
      • 9:00 to 10:15
      • Mainstream digital objects into existing workflows
        • Importing from legacy systems
      • Exporting
      • Example of collaborative development for interoperability
        • METS transform (courtesy of CDL)
      • [BREAK 10:15 TO 10:30]
    • 3. Agenda Part 2
      • 10:30 to 11:30
      • Customizing and integrating your CONTENTdm site
        • Web templates
        • Custom Queries and Results
        • Configuration files
    • 4. Agenda Part 3
      • 11:30 to Noon
      • Handling Finding Aids
      • Importing EAD files into CONTENTdm
    • 5. Setting the context: fully engaged in digital library transformation
      • Library services and collections expanding to encompass all
      • Traditional to digital
        • Licensed
        • Reformatted
      • Sharing
      • Preserving
    • 6. Leveraging resources
      • Staff time and skills throughout the organization and/or consortium
      • Existing metadata in some form
      • Existing digital collections (images and transcripts)
    • 7. Why? For better customer service
      • In order to mainstream your processing and amplify your efforts.
      • Your digital collections should ultimately be mainstreamed into regular workflows, similar to the ones used for other materials (whether that’s done centrally or in a distributed fashion).
      • This includes selection, technical processing (cataloging, organizing, importing), integration with site vis-à-vis presentation and archiving.
    • 8. Mainstreaming processing of digital formats (Part 1 of 3)
      • Importing from other systems to CONTENTdm
      • Exporting from CONTENTdm
      • Example of collaborative development for interoperability
        • CONTENTdm Standard Export
        • METS transform for import
    • 9. I . Importing from other systems to CONTENTdm
      • Metadata only
        • When records describe items that are not yet scanned
        • Replace “null” files at later time
      • Metadata AND their digital files
    • 10. From an OPAC or other database system
      • When you have…
      • Individual image files cataloged already
      • And can export from an OPAC or other dbms
      • Or where you have compound digital objects ready for migration
    • 11. Migration steps:
      • Prepare the collection and the import files
        • Cross-walk metadata to Dublin Core
        • Configure the CONTENTdm collection fields
      • Export and prep data in a tab-delimited ASCII file
      • Import the file to CONTENTdm
    • 12. Data prep: Common problems in tab delimited data files
      • Extra data in columns or rows
        • Extra tabs at end of line
        • Extra CRs at end of file (Should only be 1 CR)
        • Carriage return in metadata, tab in metadata
      • Files must exist
        • 0 versus O
        • Error may occur in previous record, check few rows before and after error
      • File names are required, not full pathnames
    • 13. Data prep: Troubleshooting with Excel
      • Use Microsoft Excel to open the file and view data
        • Each row should be an item with last column as filename
      • Work with small batches to find errors – keep adding items until record with error is found
      • Use Excel’s “CLEAN” function to remove invisible characters
      • Import images from directory without using tab delimited file
        • Checks for any type of imaging errors
    • 14. Demo : MARC to DC
      • Export MARC records to tab-delimited text file (using ILS or MarcEdit)
      • Format and clean up the text file to conform to your CONTENTdm Collection schema
      • Import the file (with or without images) to the Collection
    • 15. Importing compound objects
      • For documents, postcards, monographs and picture cubes
      • Can do singly or in batch
      • Much easier to start with singles, then set up for batch when process is smooth
    • 16. Migrate compound objects from another database system
      • Where you have many compound digital objects to migrate
      • Prepare the collection and the import files
        • Cross-walk metadata to Dublin Core
        • Configure the CONTENTdm collection fields
        • Configure folders for scans and transcripts (if appropriate)
        • Choose an import method based on your data structure
        • Create tab-delimited ASCII file(s) appropriate to the method
        • Import the files to CONTENTdm in batches
    • 17. Multiple compound object wizard
      • Documented in online tutorial
      • Today’s demo described in handout
      • Four import methods for multiple object loading
        • Compound object (same as single, but upload batched)
        • Directory Structure (most flexible and efficient)
        • Object List (useful when NO page-level metadata)
        • Job List
      • Time allowing, demonstrate three different object types using 3 of 4 methods
    • 18. Choose a multiple compound import method based on your data YES YES * YES Monograph YES * YES YES Documents * YES YES YES Postcards Object List (No page-level metadata) Directory Structure Compound Object * Will demo
    • 19. Do you have page-level metadata for the compound objects ? Are your scan files separated into compound object directories? Create compound object directories for EACH compound object. No Yes DIRECTORY STRUCTURE Yes Do you have one tab-delimited text file containing ALL the objects? Are they all the same type of compound object ? Break up into batches by type No No OBJECT LIST Yes Do you have tab- delimited text files for EACH compound object? . DIRECTORY STRUCTURE . Create text file listing all compound objects and object metadata or create a text file for each compound object. No Yes No Yes
    • 20. Every one of the four CONTENTdm compound object importing methods
      • Requires object -level metadata
      • Requires preparation
        • File–naming, keeping sort order in mind
        • Each object has own directory for scans
        • May use tab-delimited text file(s)
      • Accommodates transcripts
    • 21. A word about descriptive page-level metadata
      • Supported by some but not all 4 import methods
        • NOT supported by Object List
      • At page-level Title is only field required
        • Technical metadata, can be generated by Template creator
    • 22. More on transcripts
      • Typescripts and transcripts
        • Requires a field designated as the data type “Full Text Search”
        • Inserted into the metadata field of the scanned page
          • During import
            • Through use of .txt file found, or
            • By Template Creator
              • If OCR Extension in use
              • Or by “Directory Import” as with early versions of CONTENTdm
      • Transcripts and typescripts are supported by all four methods (i.e., not considered “metadata” for purposes of this discussion)
    • 23. Demo: Import Multiple Compound Objects
      • Monograph using Compound Object method
      • Postcards using Object List method
      • Documents using Directory Structure method
    • 24. II. Exporting from CONTENTdm
      • To ascii tab-delimited with field headers
      • To xml:
        • Standard Dublin Core —only DC
        • Custom—all fields, including local but not structure
        • CDM Standard—all fields, including structure
    • 25. III. Examples of collaboration for interoperability
      • Web integration through search engines, RSS
      • OAI harvesting
        • Enable at collection or server level
        • Choose to suppress <pagedata> or not
      • WorldCat registration
        • Open WorldCat integration
    • 26. CONTENTdm and a new METS transform
      • Info available on USC in July
      • Code at SourceForge
      • Windows-oriented
    • 27. The CONTENTdm to METS conversion tool
    • 28.
      • What is/are METS?
      • Why is/are METS good?
      • What is 7train?
      • How do I use 7train?
      • What do I get from 7train?
      • How do I get 7train?
    • 29.
      • What is/are METS?
        • METS (Metadata Encoding and Transmission Standard) is an
        • XML-based standard for encoding metadata to describe
        • objects (digital or otherwise) within a digital library.
      • See for more information
    • 30. What is/are METS? METS metsHdr structMap dmdSec amdSec fileSec behaviorSec METS metsHdr structMap dmdSec amdSec fileSec behaviorSec Yellow elements/tags are required; all others are optional Metadata for the management of the object: technical details, object history, etc. Description of the structure of the object, i.e. how the files fit together What to do with the object: machine actionable instructions A list of files that make up the object Descriptive metadata - title, author, subjects, etc. Metadata about this particular METS - encoder, contact info, etc.
    • 31. Why METS? To be able to add your objects to other collections and increase the visibility your institution's assets.
    • 32. What is 7train? 7train is an XSL-based tool for converting XML documents - in this case CONTENTdm exports describing objects managed in the CONTENTdm system - into METS objects suitable for submission to a digital library system, such as the California Digital Library's Online Archive of California. 7train is a platform-independent, standalone tool that was designed to work on any system and to be simple to use.
    • 33. How does 7train work? It is as easy as dragging your CONTENTdm XML export file onto an executable file.
    • 34. How does 7train work?
    • 35. How does 7train work? What do you get?
    • 36. Output: A Sample METS document
    • 37. References & Links 7train Home: 7train Download: CONTENTdm: METS: XSL: The California Digital Library: The Online Archive of California:
    • 38. Interoperability Librarians, Archivists… For Library Users OPEN WORLDCAT OAI MARC RECORDS OAI Web WorldCat Regional Union Catalog Other digital archives OAI OAI XML DC DC CONTENTdm Existing Libraries 10K/50K/ Unlimited Objects New Libraries Other CONTENTdm sites CONTENTdm Multi-Site Server OPACS
    • 39. BREAK—15 minutes
      • This concludes Part 1
      • To come after the break:
      • Part 2
        • Customization
      • Part 3
        • Finding Aids
    • 40. Customizing and integrating your CONTENTdm site (Part 2 of 3)
      • Web templates
      • Custom Queries and Results
      • Configuration files
    • 41. CONTENTdm Web Templates
      • Customizable for integration
      • Designed to support broad range of users
        • Small to large organizations
        • Beginners to experts
      • Use out of the box with minimal customization
        • Basic customization requires minimal HTML skills
      • Fully customize including advanced extensions
      • Based on a PHP API ( Hypertext Preprocessor and Application Program Interface)
    • 42. Basic Customizations
      • Minimal skills needed
      • Easy to make changes
        • Global include files
        • Variables
      • Recommend all organizations do basic customizations
        • Header (name/logo), contact e-mail address, colors, about page, home page
    • 43. Getting Started
      • Access to Web server docs directory
      • HTML editor or text editor
      • Design plan
      • Logo or other graphics
      • Backup copy of original files
    • 44. Customization Demo
      • Files located in /cdm4 directory
        • /includes/global_header.php
        • /client/LOC_global.php
        • /client/STY_global_style.php
        • about.php
        • browse.php
        • results.php
      • New logo saved in /cdm4/images/
    • 45. Advanced Customizations
      • Experience with HTML, PHP, and JavaScript needed
      • Customize looks for each collection
        • University of Nevada, Reno
      • Web Template extensions
        • E-commerce (University of Utah, Oregon State University)
        • Comment forms (SENYLRC, Enoch Pratt Free Library, OSU)
        • Custom metadata display (University of Oregon)
        • QuickTime video (Williams College)
    • 46. Examples of Advanced Customizations
      • University of Nevada, Reno
      • University of Utah
      • Oregon State University
      • SENYLRC
      • Enoch Pratt Free Library
      • Williams College
    • 47. Customizations Tips
      • Always make a backup!
      • Be aware of encoding (UTF-8 vs. ASCII)
      • See what other users are doing
        • Share, borrow, and copy ideas and code
        • Listserv
      • Document changes
        • Document which files are edited and what code changes are made to ease upgrading to newer versions
    • 48. Custom Queries and Results (CQR)
      • Create predefined, custom queries
        • Virtual collections
        • Guide users to specific results
        • Integrate with other sites
      • Multiple options
        • Simple hyperlink, drop-down list, index box, text box, browse
      • Easy to use
        • Wizard generates code to copy and paste into Web pages
      • Documentation
    • 49. CQR DEMO
      • Generate code using CQR
      • Copy and paste into Web pages
        • May need to change path
        • Customize as desired
    • 50. Configuration Files
      • Customizable files that reside on the server
      • Stop words
        • Full text field stop words – fullstop.txt
        • Automatic hyperlink stop words – stopwords.txt
      • Image viewer
        • Customize how images are displayed – imageconf.txt
        • For all collections or per collection
    • 51. Imageconf.txt Demo
      • Located in the /conf directory on the CONTENTdm server
      • Can change globally or for individual collections
        • If you wish to change the zoom and pan default settings for a particular collection, copy the imageconf.txt file from the Server/conf directory to the index/etc directory of the collection(s) you wish to modify.
      • Make a backup copy!
    • 52. Introduction to Finding Aids
      • How many of you have them?
      • Are they digital documents or paper?
      • If digital, are they XML?
        • Basic: create documents, monographs, and use http protocol to link
        • XML: use EAD DTD, and style sheet to display
    • 53. Handling Finding Aids Part 3
      • Importing EAD files to CONTENTdm
    • 54. Current EAD Support
      • Import of EAD files
      • Automatic text extraction from EAD files when:
        • The file extension of the EAD is .xml.
        • The file includes a header record beginning with DOCTYPE ead.
        • The collection has a full text search field.
        • The full text search field is empty when the item is added to the collection.
      • Up to 128,000 characters extracted from the following fields and placed in the full text search field
        • titleproper, title, unititle, persname, famname, corpname, genreform
    • 55. Current EAD Support
        • Display determined by style sheet
          • XSLT
          • CSS
        • Client side parsing
        • Affected by Web browser
    • 56. Getting Started
      • EAD XML files
      • EAD DTD
      • XSLT style sheet
    • 57. EAD Demo
      • Configure Full Text Search field
      • Store DTD and style sheet on server
      • Edit path to DTD and XSLT in EAD files
      • Import (single or batch)
        • Add metadata
        • Custom thumbnail if desired
      • Upload, approve, index
    • 58. Custom EAD Extension
      • Example by Oregon State University
        • Terry Reese, [email_address]
      • Customized Web templates
        • Client side or server side parsing
        • Integrates display in templates
      • VBScript for extracting metadata from EAD to tab-delimited text file
    • 59. Oregon State University EAD Collection
    • 60. Announcing new exposure for your CONTENTdm Collections
      • Collection of Collections
        • (also featured at
      • Harvesting metadata from Collection sites at:
        • Uses CONTENTdm Multi-site server