Christensen dunlop dublin_core


Published on

The Case for Implementing Core Descriptive Embedded Metadata at the Smithsonian presented at the Dublin Core 2010 conference in Pittsburgh, PA

1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Afternoon, My name is Doug Dunlop, Metadata Librarian, Smithsonian Institution Libraries. My co-presenter Stephanie Christensen, Digital Imaging Manager, National Anthropological Archives could not be here to present with me. So I will be flying solo.With that said, today I would like to introduce you to the work of a Smithsonian pan-institutional group, the Embedded Metadata Working Group (otherwise known as EMDaWG) that formed April of 2009 with the purpose of establishing Basic Guidelines for Minimal Descriptive Embedded Metadata in Digital Images.
  • Now before going into the endeavors of the group, I will start off with some background information about Smithsonian collections: In fiscal year 2009 SI reported 137.3 million collection items, 100,000 cubic ft held by the archives, and 1.8 million volumes held by the Libraries.This multitude of images lives in many different systems: Across the institution there are 11 instances of Emu, a museum collections management system, 9 instances of TMS another museum collections management system, 1 MIMSY a general collection management system, 8 instances of Horizon an online public access catalog system, Artesia as a Digital Asset Management System, and in addition to those systems there are a number of other content silos and local databases.
  • This multitude of images and objects, along with their associated metadata, which is sometimes embedded and sometimes not, can travel to many places and distant lands virtually and literally. As a case in point and as an illustration for including embedded metadata is the following imagined scenario of the life of an image.Let’s take this larger book page image you see here as our example, which is from the Cooper Hewitt Design Museum Library.Now this digital image did not start its life on its own but rather as book level metadata in the Libraries’ Horizon catalog. It then moved out from mom and dad’s house, displaying its own embedded metadata identity, to live in the Smithsonian Institution Libraries Galaxy of Images, which is the library’s online image collection.From there it traveled into the DAMS with the image’s embedded metadata being mapped into the DAMS IPTC fields so that the image is discoverable within that system.From the DAMS the image then traveled to the Smithsonian Cross Search Center, which offers one stop searching through a central index based on an in-house metadata model, in which case the embedded metadata was used to create a brief record, and finally from there the image traveled to additionally reside in the Smithsonian flickr commons. Throughout the journey the embedded metadata aided in the description of the digital object. Without the embedded metadata this story could have had a different outcome. Now of course this is one of many possible image life cycles for an image or digital object within the Smithsonian.
  • Now for a story with a less than ideal outcome. Sometimes the life and story of an image is fraught with tribulations. The following case is a true to life cautionary tale of an image without associated embedded metadata, that ventured out on its own in the pre EMDaWG days.The story you are about to hear is my re-enactment of a true life event. Our story starts with an email from an Image requestor: “I had left a message for you, and wanted to email you to just double it up.This is X from Charleston Magazine. We have a picture that is from the Smithsonian Museum and we need permission to use it in our publication. It is a picture of two men dueling. Please let me know if you want me to send you a copy of the picture so that you may view it. I look forward to hearing from you.”Email response from Smithsonian Librarian: “Hi X, Do you have any other information about the image? Where was it found? What is the caption information? To exactly which museum is it credited?”Email response from Image requestor: “The image was sent to us by the writer of the article. He works at the Charleston Museum and they had used it in the past and the only information that he had for the photograph was that it was from the Smithsonian Museum (no idea which one or which department). Is there a possible next step we can take at this point, since we do not have any more information?? Thanks for the help.”Unfortunately for the image, here is where its story ends.
  • Now I will shift gears from the secret life of images to get more technical about embedded metadata itself.Since embedded metadata is another flavor of metadata, it can be descriptive, technical, structural, and administrative in nature. For EMDaWG descriptive metadata was the focus with a best practices document being the desired outcome. For the working group, adding descriptive metadata to a digital image allows us to take advantage of existing technologies that can read and extract that metadata, allowing others to search for our images. It is also helpful when an image comes back to the Institution, from a researcher or the general public to help in determining its original source and location. With embedded metadata the general purpose is to make an object self describing, to provide technical characteristics of the file, to provide information on restrictions and permissions, and to identify the provider and source of the digital object. With this in mind EMDaWG embraced the basic concept, that the more standardized and useful information we put into the metadata, the more searchable,discoverable, and useable our images will be, both inside and outside the Smithsonian.
  • Now embedded metadata itself can come in many flavors. One of the main components of embedded metadata for images is IPTC which stands for International Press Telecommunications Council, which developed out of the world of photojournalism and now is used by Adobe as a core schema for images. Another type of image metadata is EXIF, which according to Adobe documentation came about when:A group of major photographic manufacturers, in conjunction with the Japan Electronic and Information Industries Association (JEITA), developed a metadata schema called Exchangeable Image File Format (EXIF). This schema is designed to embed in a digital capture—right at the moment of exposure—information relating to the camera’s function and image parameters. And last but not least there is XMP, which stands for Extensible Metadata Platform. XMP is a metadata framework for images and digital objects, which was developed by Adobe. It is a common metadata framework that standardizes the creation, processing, and interchange of document metadata across publishing workflows. As a framework XMP can contain XMP metadata, Dublin Core metadata, IPTC metadata, EXIF metadata, PDF Properties, Photoshop Properties, and TIF Properties. Within the context of EMDaWG, IPTC was the schema of focus for the core embedded fields.
  • As stated in the previous slide, XMP is a metadata framework created by Adobe that uses IPTC as a core metadata schema.XMP is represented in RDF XML for interoperability and use outside of Adobe products, especially in a web environment, and also for use by developers. In regards to IPTC it was chosen for use by EMDaWG not only because it is core to Adobe XMP, but also because it constitutes one of the basic metadata editing templates within Bridge and other image editing applications. In addition IPTC is broadly used across the Smithsonian by photographers and imaging specialists, and is automatically mapped into designated IPTC metadata fields within the institutions’ DAMS to be used mainly as un-editable but searchable fields.
  • Now that you have some context, I will delve further intoEMDaWG and the case for embedded metadata at the Smithsonian.As digitization across the institution has increased, the Collections Search Center (CCS) has been established, and the Digital Access Management System (DAMS) is beginning to be utilized, the search and retrieval of SI digital assets will become increasingly important. The searchable metadata associated with these assets is dependent on the comprehensiveness of the description of the original analog material. The metadata quality will begin to work in conjunction with the quality of the metadata contained within the images. The start of quality metadata begins with embedded metadata. While this is minimal, it is the starting point for the lifecycle of metadata that will travel near and far with the digital asset. When the Embedded Metadata Working group started, the tale of how the best practices came to fruition was with a guiding principle to come up with a core set of embedded metadata fields for images that could be used within SIRIS and the DAMS and grew in focus to be used SI wide. This process included conducting a survey of the working group members as to what embedded metadata fields respective units were using or not using, how they were populating those fields, what if any guidelines they were following, as well as conducting in depth meetings in which workflows, data needs, and data usage were discussed. Also informing these discussions were the findings from the Federal Agencies Digitization Guidelines Initiative (FADGI) Still Image Working Group. (Embedded Metadata And Technical Metadata Subgroup) Significant findings that emerged from this entire process, which informed EMDawG, included: variations in use of embedded metadata in current practice --the results of the survey showed that there were disparities in the metadata fields used by various units, which made the sharing of common search catalogs difficult , need for defining specific metadata fields to meet a variety of needs, and the importance of certain fields.The end result of EMDaWG’s findings was the creation of two groupings of embedded metadata:One being the core group and the other being the suggested group.
  • The core set contains fields that are consistent across Smithsonian units, whilethe suggested set of fields accommodates for the different, various needs across the institution's units. In addition, the working group came up with best practices and recommendations for data input.  Here listed in the slide are the minimal core fields:Document title, copyright notice, source, and creator. The working group determined to use and follow IPTC fields and definitions as closely as possible sincemany within the Institution were already using IPTC, althoughthe fields were not being used in the same way, Also the group wanted to ensure that the fields and data values worked well with the SI DAMS core metadata model developed for Still Imaging. I should note that many of the participants served on both groups. This way efforts were not duplicated- but eventually would lead to a common goal.  During the process of crafting the document, some of the issues confronted by the group for employing embedded metadata included: the various institutional uses of the IPTC Title field either as a descriptive title or file identifier; inability of using commas within IPTC Keyword and Creator fields due to the fact that commasbecome delimiters when embedding using Adobe products; the conflicting use of Date Created to refer to the creation date of the digital object or as creation date of the original object; and the conflicting use of Creator as creator of the digital object or as creator of the original object. It was with these needs in mind, along with considerations for internal use and workflow, that thecore fields were developed.
  • This illustration shows how a glass plate negative contains the four core fields. It self describes the asset, and if the asset gets separated from its collection, the potential of re-associating in the future exists. The group also tried to keep in mind digitization workflows, productivity and availability of software across the Institution to facilitate practical implementation. The working group realized that a variety of people--photographers, digital imaging specialists, web practitioners, scientists, interns, vendors, etc.--may be creating digital images. And in many cases will be working with various versions of Photoshop. In many cases, people will be working with various databases and Collection Information systems. Most likely, embedded metadata will be inserted by the asset creator (i.e. photographers, digital imaging specialists, interns). Often in the creation of the digital asset, embedded metadata can be batch processed. Batch processing aids in a more efficient workflow and productivity. Therefore the benefit of establishing an embedding workflow to include a minimal amount of metadata does play a crucial role to the sustainability of the digital asset. With that said, a couple of basic best practices that the group established are: data, which is unique to a single asset, should not be included as part of batch processing. Also data subject to change should not be embededed.
  • As I previously mentioned, The core set contains fields that are consistent across Smithsonian units, while the suggested set of fields accommodates for the different, various needs across the institution's units. The suggested set might be useful for one unit, but not for another, therefore we determined these fields to be suggested and not core. Fields with recommended data input in the suggested set include:DateDescriptionKeywordsCredit/ProviderJob IdentifierAnd Headline  
  • Here is an example in Bridge of how animage might hold additional suggested information, with the metadata not displayed in the Bridge description window included on the side.
  • As stated earlier, most fields used by EMDaWG keep the IPTC field name except for Document Title, which uses the XMP field name.Pictured in the following three slides is how the fields chosen by EMDaWG best map to IPTC and Dublin Core.The working group tried to stick to a strict use of IPTC fields when possible, with a few exceptions like Creator, which can be used for the creator of the analog and digital version. Most fields map logically to related Dublin Core fields, except Document Title, which follows a common photographic practice to use IPTC Title to contain the file name for the digital object and therefore is the equivalent of Dublin Core Identifier. Also the DC field Publisher, is the closest matching field to the working group’s defined use of Source, which is the SI owning unit that is digitally publishing the material.
  • Pictured in this slide is the mapping from the Suggested Fields.As stated earlier, Date can refer to the Date of the original analog object or to the creation date of the digital object.With Keywords the working group suggests using a controlled vocabulary whenever possible, although the specialized nature of some materials at the Smithsonian calls for vocabulary created for in house use.
  • Pictured here are the rest of the suggested fields.As you have probably noticed, Publisher and Identifier have been used multiple times due to those two fields being the closest conceptual match to the field used by EMDaWG.
  • Now to move on to a current pilot project.As the working group’s efforts begin to move forward, we are testing the capabilities of the CCS and DAMS systems with new ways to extract and deliver metadata. For example, the National Anthropological Archives is working with the SIRIS office to extract embedded metadata from images within the DAMS as well as from core DAMS metadata applied on ingest to map that data to help assist in creating MARC records for use in SIRIS. The mapping of the data proves crucial for this to succeed. This pilot project will involve a collection of 5,000 images that have been digitized but for which no item level records exist. The goal is to create skeletal minimal MARC item level records based on the embedded metadata and applied DAMS core metadata that can then be enhanced by staff and then eventually pushed out to the public.  Should this project prove feasible, this would help in automating some of the preliminary steps of the process. The project should free up staff time by automating some of the process, and provide a solution that balances a multiple image display capability and the ability to enhance search functions. It should assist in saving time, not making more work.
  • As Smithsonian digital assets begin to be more dispersed on institutional websites, blogs, social media sites, publications, and a variety of outlets, it becomes increasingly important to be able to trace that image back to its original location. For instance, how many times has someone that handles image orders at dealt with a similar scenario as the Charleston Magazine story that I mentioned earlier. As the Smithsonian begins to increase the diffusion of knowledge and these digital assets travel to many places, we want to ensure that they will be able to find their way home, not only to the Smithsonian, but to the particular holding unit that oversees that particular image. As the Smithsonian begins to move forward in creating these digital assets and pushing them out to the public, it is at the creation stage that instituion must also have the systems search and retrieval functions in mind. These images will only be as accessible as the institution makes them. In light of the goals and objectives of the Smithsonian Strategic Plan, a commonly understood and implemented metadata terminology would be beneficial to the Institution. It would minimize confusion in retrieving assets, readily identify which unit and collection within that unit the image came from, and also inform the user that the image may have a copyright restriction and that contacting the particular holding unit of the digital asset may be appropriate before use. This guideline has been created with the intention of educating and providing guidance for those working with digital image collections.
  • While this has proven to be a productive collaboration, we have recognized that there still are challenges within the Smithsonian. Defining a core set of fields that can accommodate a broad range of materials was not an easy undertaking. Our core model suggests elements that are consistent across theInstituion, while our suggested set of elements was established to accommodate the needs that one unit may have over another. As we begin to establish a consensus on still images, the analog holdings of the institution's materials ranging from scientific data, video, audio, to PDF will require its own attention. While some fields may be consistent across formats,we recognize that the needs for core embedded metadata will be disparate. The group hopes to begin to address PDF next, modifying the group membership so that units holding or working with these formats will actively contribute to the development of basic guidelines. Also recognizing that technology changes rapidly and what we decide on today may not be what we need in the future, the group recommends that core descriptive embedded metadata be reviewed every two years to stay current with best practices. At the of the publication of the embedded metadata document, Adobe had just released the latest version of their Creative Suite software package (CS5), which includes recent updates to the IPTC schemathat in large part address some of the data input issues the various members of the working group encountered (i.e. Creator and Date Created). The working group determined that these changes would not be reflected in the document until CS5 is more widely adopted and evaluated institutionally. The document is available through the Smithsonian Research Online through the handle included in this slide.
  • On a final note, while I am here presenting this document and the work of EMDaWG today, this has been the combined work of fellow Smithsoniancolleagues that have shared the same interest and passion in working towards a basic guideline for minimal embedded metadata and making our digital still images more robust and accessible. It has been this collaborative effort and teamwork that has helped in the realization of this document. Thank you.
  • Christensen dunlop dublin_core

    1. 1. The Case for Implementing Core Descriptive Embedded Metadata at the Smithsonian<br />DC-2010<br />Pittsburgh<br />Stephanie Christensen <br />Doug Dunlop<br />
    2. 2. In the Beginning<br />A Universe of images and associated metadata from a across the Smithsonian: <br /><ul><li>National Anthropological Archives
    3. 3. Smithsonian Institution Libraries
    4. 4. Smithsonian Institution Archives
    5. 5. Smithsonian Center for Folklife and Cultural Heritage
    6. 6. The Archives at the National Museum of the American Indian
    7. 7. National Air and Space Museum Archives Division
    8. 8. Archives of American Art
    9. 9. Archives of American Gardens
    10. 10. Freer Gallery of Art and Arthur M. Sackler Gallery Archives
    11. 11. Division of Mammals, National Museum of Natural History
    12. 12. National Museum of Natural History Collections
    13. 13. National Museum of the American Indian Collections
    14. 14. Human Studies Film Archives
    15. 15. Eliot Elisofon Photographic Archives, National Museum of African Art
    16. 16. National Museum of American History</li></li></ul><li>Traveling to Places near and far<br /><ul><li>From a Collection Management System to the Collections Search Center picking up Metadata along the way from Horizon
    17. 17. From a Collection Management System to Artesia with a second home in Collections Search Center
    18. 18. From Collections Search Center to flickr
    19. 19. And other travels too numerous to name</li></li></ul><li>With Many Stories to Tell<br /><ul><li>I was in this most amazing scientific publication
    20. 20. I traveled to the land of flickr and then on to….
    21. 21. I lived in the Galaxy of Images then traveled to a land filled with teachers and educators
    22. 22. I was used as the inspiration for a line of furniture</li></li></ul><li>Embedded Metadata<br />While metadata is most commonly thought of as existing in a system external to the content (such as a library online catalog), it can also be included as part of the digital content file itself. Metadata contained within the file is referred to as embedded metadata. Federal Agencies Digitization Guidelines Initiative <br /><br />
    23. 23. Flavors of Embedded Metadata<br /><ul><li>International Press and Telecommunications Council (IPTC) Photo Metadata</li></ul><br /><ul><li>Exchangeable image file format (EXIF)</li></ul><br /><ul><li>Extensible Metadata Platform (XMP)</li></ul><br />
    24. 24. XMP and IPTC Embedded Metadata<br /><ul><li>IPTC is core to XMP
    25. 25. IPTC is easy to use in Adobe Bridge
    26. 26. IPTC is already part of imaging workflows
    27. 27. IPTC fields map to SI DAMS
    28. 28. XMP employs RDF</li></li></ul><li>The Tale of Best practices<br /><ul><li>Survey Stakeholders
    29. 29. Determine Current Practices, if any
    30. 30. Review Results
    31. 31. Identify consistencies or disparities of data elements
    32. 32. Develop a Core Set of Required Fields
    33. 33. Develop a Set of Suggested Fields
    34. 34. Put into Practice</li></ul>-Smithsonian Collections Search Center<br />-FADGI<br />
    35. 35. Core EMBEDDED METADATA FIELDS<br />….We can begin to come to a consensus as to consistent information across SI….<br />
    37. 37. NOT REQUIRED - SUGGESTED SET of EMBEDDED METADATA FIELDS<br />…and for those that need additional fields to accommodate various needs… <br />
    38. 38. NOT REQUIRED - SUGGESTED SET of EMBEDDED METADATA FIELDS PUT INTO PRACTICE<br />La Alhambra Palais:<br />Date: 1/1/1842<br />Headline: Plans, elevations, sections, and details of the Alhambra, from drawings taken on the spot in 1834 by Jules Goury, and in 1834 and 1837 by Owen Jones.<br />Credit/Provider: Cooper Hewitt National Design Museum<br />
    39. 39. Mapping to DC (required Fields)<br />
    40. 40. Mapping to DC (Suggested Fields)<br />
    41. 41. Mapping to DC (Suggested Fields, cont.)<br />
    42. 42. Towards a Greater Access: Pilot Project<br />
    43. 43. Institutional USE<br />Collection Information Systems<br />Digital Asset Management System<br />Smithsonian Websites<br />Smithsonian Online Exhibitions<br />Social Media Sites<br />Smithsonian Blogs<br />Publications<br />Education and Outreach<br />The possibilities are endless………….<br />
    44. 44. Future Developments<br />Document hosted on DSpace<br /><br />Working with FADGI( Federal Agencies Digitization Guidelines Initiative) Embedded Metadata Working Group<br /><br />
    45. 45. Acknowledgements<br />The following people have participated in EMDaWG and have helped in the realization of the document: <br />Basic Guidelines for Minimal Descriptive Embedded Metadata in Digital Image (Embedded Metadata Working Group- Smithsonian Institution April 2010).<br />Stephanie Christensen- NAA, Doug Dunlop- SIL, Lowell Ashley- SIL, Ricc Ferrante- SIA, Cindy Frankenburg- NMAI, <br />Ducky Nguygen- NMAI, Kay Peterson- NMAH, Suzanne Pilsk- SIL, Ken Rahaim- OCIO, Marguerite Roby- SIA, Erin Rushing- SIL, Stephanie Smith- CFCH, Rebecca Snyder- NMNH, Amy Staples- NMAfa, Sarah Stauderman- SIA, Patti Williams- NASM <br />and Merry Foresta for comments and editing<br />