• Like
  • Save
Embedded Metadata working group
Upcoming SlideShare
Loading in...5
×
 

Embedded Metadata working group

on

  • 746 views

Presentation at "The Semantic Web, Libraries, and Visual Resources" session at VRA + ARLIS/NA 2nd Joint Conference in Minneapolis, MN.

Presentation at "The Semantic Web, Libraries, and Visual Resources" session at VRA + ARLIS/NA 2nd Joint Conference in Minneapolis, MN.

Statistics

Views

Total Views
746
Views on SlideShare
746
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • First a quick explanation of embedded metadata. Digital music players illustrate how we all use embedded data on a daily basis without giving it a second thought. How do MP3s do it? That’s right, embedded metadata! Once a music file has its Artist, Album, Title, Genre, and other information, embedded, it can be moved to another computer, an iPod or the cloud and identify itself. It can be integrated into a new user’s database based on these fields and be sorted and searched.
  • Wouldn’t it be great if digital image files were as easy to use?  It would be great if you could move image files from software to software, from device to device and still be able to search, categorize, and sort them.
  • More often than not, all you know about a digital image is its file name and when it was created.
  • By embedding metadata in the files, images can have the same usability as MP3s, like displaying the Title and tags.
  • And searching.
  • How does this work? Digital image formats, such as TIFF, use a set of metadata tags in the file header to tell your system how to interpret and display the image.  Some of these tags contain data recorded by cameras and scanners such as  date, time, and device settings.
  • There is no reason that data about what is shown in the image can’t be encoded as well - it’s all just bits after all. The TIFF format has several tags for descriptive metadata.
  • The process of embedding the data is handled by your photo editing or organization tool. It converts your text into code and adds it to the file.
  • There have been a few different embedded metadata formats, but one of the newest, and one which has been widely adopted, is Adobe’s XMP. XMP encodes the caption using RDF/XML. Here is a snippet for Dublin Core Subjects.
  • Many photo applications provide viewing and editing of embedded metadata
  • There are already several info panels built into Photoshop and Bridge for embedding metadata. They have been used by photographers for a long time. These standard panels utilize several different schemas but none of them are suited to art & architecture. Fortunately, Adobe has open source tools that make it easy to build your own custom panel. This is the reason the have started with a tool for Photoshop and Bridge.
  • To fill the need for an info panel for the cultural heritage community, This is the VRA panel, well part of it anyway. We tried to select the fields most necessary for describing cultural objects.
  • In Adobe Creative Suite applications, metadata info panels are accessed by selecting “File Info…” in the File menu
  • You then see all the panels you have loaded.
  • Purpose: Facilitate sharing descriptive metadata between a user and a chosen recipient, such as a database curator, an image sharing service, or a colleague. Useful to users and curators – basic users with consumer-level tools and pros with sophisticated tools. Lessen the barriers to general understanding Provide complexity when desired
  • First of all, the VRA panel is meant to be easily integrated into user’s workflow by inserting itself into production tools they already use – Photoshop and Bridge. We have plans to build a stand-alone version for when Creative Suite isn't an option.
  • Allow database assistants to enter pre-cataloging information, i.e., source captions, original resource documentation, or backlog tracking information. Use Adobe Bridge's bulk metadata input and editing capabilities to increase efficiency.
  • Field photography Allow curators and collection managers to more efficiently collect metadata from faculty and student contributors and ingest it into a central database.
  • Because it’s a form of RDF, an XMP record can be a mix of various schemas - as long as you follow the specifications of each.
  • Mix and match: Dublin Core, photoshop, xmp, and Iptc Many major software and hardware makers have worked together to use the same XMP properties for metadata to ensure interoperability. If a property already exists which matches your definition, use it. Why create a new one?
  • We could have gone with the classy Getty CDWA lite
  • Or good time lovin’ VRA.
  • But we went boring, choosing the most widely used schema used for embedded metadata – the one that most tools recognize. IPTC has been around for a long time and it is used in just about every photo application and social media site out there.
  • When choosing fields, like Work Title or Image Copyright, the idea is to start with schemas that are most widely used by the majority of photo applications and web services and then move down the list, using specialized schemas last. This places as much of the metadata as possible in properties that will be read by common tools. The approach to building the VRA panel was to use as many well-known namespaces as possible to provide interoperability with a wide range of photo software.  The first schema used was IPTC core and Extension, then PLUS, then any other namespace built into XMP (as specified in the XMP specs, part 2).  Remaining properties were assigned to the VRA Core 4.0 namespace.  This ensures that the most essential data about an artwork can be read when the user does not have access to the VRA or IPTC Adobe CS panels. Further, key fields are combined to create Tags and a photo Caption, the most widely supported fields in photo applications, web sites and operating systems.
  • There’s VRA all the way at the bottom looking like the least favored of the children. It turns out however, that it has an important role to fill, stepping when the other schemas can’t fulfill our needs.
  • It’s not all about database curators, we also want to give user benefits of embedded info that can be used many places: operating system, photo apps, and social media. We want to enable contributors to keep embedded metadata in their image files for the purpose of managing them with common desktop photo applications and sharing with colleagues and students.
  • First on our list were the well known and well used Dublin Core fields. These are displayed by most photo tools.
  • So what about all that detailed artwork information we are embedding, it might not be seen with specialized applications like Photoshop, right? To make sure that the most important information is carried to all likely destinations, the VRA panel concatenates most of the artwork fields to the Title, Caption, and Tags
  • Here, you can see the concatenated data for this image in the Windows 7 file explorer metadata area.
  • Here is the same file in Picasa.
  • Here and Flickr. There is our caption. Flickr also displays keywords and Title but all of our artwork fields are not shown. We want the artwork information to be seen in consumer level tools – especially free ones like Flickr.
  • But we also want to provide complexity when desired This means introducing structured data.
  • Our first choice for this was going to be IPTC, specifically Extension because it is well structured and it includes fields that are useful for the VRA panel including fields exclusively for artworks. Our original intent was to use all of these fields and then use VRA for the remaining fields such as Measurements, Materials, Technique, Culture, Style/Period. This turned out to be harder than we thought.
  • For Instance, “Date Created” is a single calendar date only and doesn’t allow for a range of dates or a complex free-text date such as “built 1298 – 1310, destroyed 1943”
  • So with several fields remaining to be mapped we turned to VRA Core 4.0. Wanting to join the RDF revolution we decided to implement the Core 4 XML structure in XMP RDF/XML. This made sense because it matched IPTC’s deeply nested structure.
  • Here is a snippet of Core 4 XMP RDF/XML. Sure, it looks great. It’s got lots of arrows, colons, and slashes and everything is nicely indented, so it should work well. Unfortunately, complications arose that got in the way.
  • One of these was the difficulty in connecting the IPTC and VRA data sections. RDF data is written in arrays or sections and these have to be merged in an exported excel document. This is not impossible, we just are not in the position to build a tool to do it.
  • You need an extraction tool that can match repeating VRA sub-elements like “Location” and “Location Type” to IPTC Location Shown versus Location Created and Repository.
  • Another thing we tried, and the method that would be the most reliable and computer friendly, would be to nest VRA within IPTC. This keeps all the artwork data together in one array and makes it possible to describe multiple artworks using multiple arrays, each one being a completely dicrete packet. This method is supported by XMP.
  • Unfortunately, Adobe hasn’t updated Photoshop and Bridge to support it so what happens in most applications is that the nested schema data is deleted. This is much too unreliable at this point.
  • We have now decided to keep it simple and use a flattened version of Core 4 display fields. This means that we will eliminate the nested arrays of parsed terms and use single free text display values instead. These are very easy to extract to Excel using tools such as ARTstor’s EMET. A spreadsheet is the easiest and most common way people will use it. This does mean that curators will have to do some work before they can ingest the data into their database.
  • We have now decided to keep it simple and use a flattened version of Core 4 display fields. This means that we will eliminate the nested arrays of parsed terms and use single free text display values instead. These are very easy to extract to Excel using tools such as ARTstor’s EMET. A spreadsheet is the easiest and most common way people will use it. This does mean that curators will have to do some work before they can ingest the data into their database.
  • When the tools allow it, mix IPTC Extension and true VRA Core 4.0 in a true RDF structure, either nested or separate. This would make the data more database friendly reducing the amount of work a curator has to do.
  • Interactive geocoding that links to Google, Yahoo and other online maps.
  • To make it easy for users to input parsed data, the panel could query a semantic web resource, such as VIAF, and retrieve authoritative database-ready data.
  • Encode values as linked data URIs that query a central database for the most accurate information possible. You could also embed any level of data that you think will provide a basic description of the content and then take the user to a central source for complete information.
  • Retrieve data from semantic databases (DBpedia, GeoNames, LoC, Europeana, VAIF, NY Times terms)

Embedded Metadata working group Embedded Metadata working group Presentation Transcript

  • Embedded Metadata working group Johanna Bauman, Pratt Institute Sheryl Frisch, Cal Poly, San Luis Obispo Jesse Henderson, Colgate University Greg Reser, UCSD Kari Smith, University of Michigan Steve Tatum, Virginia Tech http://metadatadeluxe.pbworks.com
  • Embedded Metadata working group Custom XMP info Panel
  •  
  • 4239_0002983-1-0.tif
  •  
  • Windows 7
  • Windows 7
  • Metadata tags Image Width Image Length Compression Orientation Camera Model Orientation Camera Model
  • Metadata tags Description Artist Copyright Location Comments
  • <rdf:Description rdf:about=&quot;&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;> <dc:title> <rdf:Alt> <rdf:li xml:lang=&quot;x-default&quot;>San Lorenzo, Florence; Basilica di San Lorenzo</rdf:li> </rdf:Alt> </dc:title> <dc:subject> <rdf:Bag> <rdf:li>architectural exteriors</rdf:li> <rdf:li>rulers and leaders</rdf:li> <rdf:li>Medici family</rdf:li> <rdf:li>Michelangelo Buonarroti, 1475-1564</rdf:li> <rdf:li>facades</rdf:li> <rdf:li>Renaissance</rdf:li> <rdf:li>Italian</rdf:li> <rdf:li>basilicas</rdf:li> <rdf:li>buildings</rdf:li> TIFF file Title San Lorenzo, Florence; Basilica di San Lorenzo Keywords architectural exteriors, architectural interiors, rulers and leaders, Medici family, Michelangelo Buonarroti, 1475-1564, facades, Renaissance, Italian, buildings, basilicas, buildings, religious buildings, churches, construction (assembling) Copyright Notice Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) Caption-Abstract attributed to Filippo Brunelleschi (Italian architect, 1377-1446); Michelozzo di Bartolomeo (Italian architect, 1396-1472); San Lorenzo, Florence; Basilica di San Lorenzo ; Exterior, unfinished facade; begun 1418- ca. 1700 (inclusive); stone; marble; pietra serena; San Lorenzo; Flaorence; Tuscany; Italy
  • <rdf:Description rdf:about=&quot;&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;> <dc:subject> <rdf:Bag> <rdf:li>sunset</rdf:li> <rdf:li>shadows</rdf:li> <rdf:li>mural painting</rdf:li> <rdf:li>peristyles (colonnades)</rdf:li> <rdf:li>trompe-l'oeil</rdf:li> <rdf:li>greek</rdf:li> <rdf:li>roman</rdf:li> <rdf:li>Getty Villa</rdf:li> </rdf:Bag> </dc:subject> </rdf:Description> RDF/XML
  • Metadata tools
  • Info panels
  •  
  •  
  •  
  • Custom XMP info Panel Useful to: users curators Easily shared with: desktop database web
  •  
  • Backlog processing
  • Backlog processing
  • Backlog processing Extract to Excel
  • Backlog processing Extract to Excel
  • Field photography
  • Field photography
  • Field photography
  • Field photography
  • Mixable schemas Keywords dc:subject Description dc:description Date / Time Original photoshop:DateCreated Date / Time Digitized xmp:CreateDate Date / Time Modification xmp:ModifyDate Copyright dc:rights Creator dc:creator Location (Created) Iptc4xmpExt:LocationCreated:WorldRegion Iptc4xmpExt:LocationCreated:Country Iptc4xmpExt:LocationCreated:ProvinceState Iptc4xmpExt:LocationCreated:City Iptc4xmpExt:LocationCreated:Sublocation Location (Shown) photoshop:Country photoshop:State photoshop:City Iptc4xmpCore:Location Iptc4xmpExt:LocationShown:Country Iptc4xmpExt:LocationShown:ProvinceState Iptc4xmpExt:LocationShown:City
  • Mixable schemas Keywords dc:subject Description dc:description Date / Time Original photoshop:DateCreated Date / Time Digitized xmp:CreateDate Date / Time Modification xmp:ModifyDate Copyright dc:rights Creator dc:creator Location (Created) Iptc4xmpExt:LocationCreated:WorldRegion Iptc4xmpExt:LocationCreated:Country Iptc4xmpExt:LocationCreated:ProvinceState Iptc4xmpExt:LocationCreated:City Iptc4xmpExt:LocationCreated:Sublocation Location (Shown) photoshop:Country photoshop:State photoshop:City Iptc4xmpCore:Location Iptc4xmpExt:LocationShown:Country Iptc4xmpExt:LocationShown:ProvinceState Iptc4xmpExt:LocationShown:City
  • Choosing schemas
  • Classy
  • Fun
  • Reliable
  • Choosing fields very well known specialized
  • Choosing fields IPTC Core IPTC Extension PLUS Dublin Core Other Native XMP VRA Core 4.0
  • Embedded Metadata working group useful to: Useful to: users Easily shared with: desktop web
  • Popular & Reliable dc:title Title dc:description Caption dc:subject Keywords
  • Concatenating dc:description Artwork: Creator Title Date Work Type Location Acc. Number Rights
  •  
  •  
  •  
  • Custom XMP info Panel Useful to: curators Easily shared with: database
  • Extension Creator Title Date Created Source Source Inventory Number Copyright Notice Artwork or Object in the Image
  • Extension single calendar date no text no BCE or CE Artwork or Object in the Image Date Created “ built 1298 – 1310, destroyed 1943” error what are you doing?
  • Core 4.0 XML
    • <work>
      • <dateSet>
      • <display> built 1298 – 1310, destroyed 1943 </display>
      • <date type=“creation”>
      • <earlistDate> 1298 </earliestDate>
      • <latestDate> 1310 </latestDate>
      • <date type=“destruction”>
      • <earlistDate> 1943 </earliestDate>
      • <latestDate> 1943 </latestDate>
      • </dateSet>
    • </work>
  • Core 4.0 XMP RDF/XML <vra:dateSet rdf:parseType=&quot;Resource&quot;> <vra:display> built 1298 – 1310, destroyed 1943 </vra:display> <vra:date> <rdf:Bag> <rdf:li rdf:parseType=&quot;Resource&quot;> <vra:type>creation</vra:type> <vra:earliestDate rdf:parseType=&quot;Resource&quot;> <vra:date> 1298 </vra:date> </vra:earliestDate> <vra:latestDate rdf:parseType=&quot;Resource&quot;> <vra:date> 1310 </vra:date> </vra:latestDate> </rdf:li> </rdf:bag> </vra:date> …
  • Connecting schemas <rdf:Description rdf:about=&quot;&quot; xmlns:Iptc4xmpExt=&quot;http://iptc.org/std/Iptc4xmpExt/2008-02-29/&quot;> <Iptc4xmpExt:ArtworkOrObject> . <Iptc4xmpExt:ArtworkOrObject> </rdf:Description> <rdf:Description rdf:about=&quot;&quot; xmlns:vra=&quot;http://www.vraweb.org/vracore/4.0/&quot;> <vra:work> . <vra:work> </rdf:Description>
  • VRA location IPTC Connecting schemas discovery Name City Country repository City Country AO Source Name
  • Nesting schemas
    • <rdf:Description rdf:about=&quot;&quot;
    • xmlns:Iptc4xmpExt=&quot;http://iptc.org/std/Iptc4xmpExt/2008-02-29/&quot;>
    • <Iptc4xmpExt:ArtworkOrObject>
    • .
      • <rdf:Description rdf:about=“” xmlns:vra=&quot;http://www.vraweb.org/vracore/4.0/&quot;>
      • <vra:work>
            • .
            • <vra:work>
            • </rdf:Description>
    • -
    • <Iptc4xmpExt:ArtworkOrObject>
    • </rdf:Description>
  • Nesting schemas <rdf:Description rdf:about=&quot;&quot; xmlns:Iptc4xmpExt=&quot;http://iptc.org/std/Iptc4xmpExt/2008-02-29/&quot;> <Iptc4xmpExt:ArtworkOrObject> . <Iptc4xmpExt:ArtworkOrObject> </rdf:Description>
  • EMwg Mix VRA Core Display
    • IPTC
      • Dublin Core Photoshop XMP Rights PLUS
  • Core 4 Flat <vrae:work rdf:parseType=&quot;Resource&quot;> <vrae:dateDisplay> built 1298 – 1310, destroyed 1943 </vrae:dateDisplay> </vrae:work> <vra:dateSet rdf:parseType=&quot;Resource&quot;> <vra:display> built 1298 – 1310, destroyed 1943 </vra:display> <vra:date> <rdf:Bag> <rdf:li rdf:parseType=&quot;Resource&quot;> <vra:type>creation</vra:type> <vra:earliestDate rdf:parseType=&quot;Resource&quot;> <vra:date> 1298 </vra:date> </vra:earliestDate> <vra:latestDate rdf:parseType=&quot;Resource&quot;> <vra:date> 1310 </vra:date> </vra:latestDate> </rdf:li> </rdf:bag> </vra:date> …
  • Core 4 Flat work/agentDisplay work/titleDisplay work/dateDisplay work/stylePeriodDisplay work/cultureDisplay work/workTypeDisplay work/materialsDisplay work/techniqueDisplay work/measurementsDisplay work/repository work/site work/location (by type) work/rightsDisplay work/descriptionDisplay work/subjectDisplay work/relationDisplay work/textRefDisplay
  • Fully Nested Data
    • <rdf:Description rdf:about=&quot;&quot;
    • xmlns:Iptc4xmpExt=&quot;http://iptc.org/std/Iptc4xmpExt/2008-02-29/&quot;>
    • <Iptc4xmpExt:ArtworkOrObject>
    • .
      • <rdf:Description rdf:about=“” xmlns:vra=&quot;http://www.vraweb.org/vracore/4.0/&quot;>
      • <vra:work>
            • .
            • <vra:work>
            • </rdf:Description>
    • -
    • <Iptc4xmpExt:ArtworkOrObject>
    • </rdf:Description>
  • Integrated Geocoding
  • The Semantic Web “ Lucien Vogel”
  • http://aal.ucsd.edu/vracore4/example020.xml <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'> <rdf:Description rdf:about='' xmlns:Iptc4xmpCore='http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/'> <Iptc4xmpCore:CreatorContactInfo rdf:parseType='Resource'> <Iptc4xmpCore:CiAdrCtry>United States</Iptc4xmpCore:CiAdrCtry> </Iptc4xmpCore:CreatorContactInfo> <Iptc4xmpCore:Location>Rohwer, Arkansas</Iptc4xmpCore:Location> </rdf:Description> <rdf:Description rdf:about='' xmlns:dc='http://purl.org/dc/elements/1.1/'> <dc:creator> <rdf:Seq> <rdf:li>Sugimoto</rdf:li> <rdf:li>Henry</rdf:li> </rdf:Seq> </dc:creator> .xmp file The future
  •  
  •