Hi everyone, I hope you’ve been enjoying your weeks with Michael, working on systems and standards. In this last week before Spring Break, you’ll be reviewing the work of the previous six weeks in preparation for a quiz that tests your knowledge of some of the hard facts we’ve studied so far this term, and beginning to apply some of the ideas we’ve worked with to the collections you identified in last week’s assignment. This week’s introductory presentation will be brief, and the readings are minimal, because I hope you’ll spend much of your time this week actually working with some records from your collections, applying vocabularies or building thesauri to learn first hand how collections cataloguing and vocabularies relate.
This week we’ll be looking at one tiny aspect of cataloguing standards--controlled vocabularies. Vocabularies may seem, at first glance, a kind of arcane and specialized aspect of cataloguing, so to disabuse you of that notion, and before we go any further, I want to share with you the examples that taught me to respect the value of vocabularies and thesauri. I’ve borrowed these examples from Murtha Baca, whose writings you encountered in the first two weeks of class (and which we’ll revisit this week with, I hope, a little more perspective on the implications of what she has to say). Murtha’s passion for standards generally and controlled vocabularies in particular has taught generations of museum professionals to understand the value of some thoughtful standardization. These two examples are well known to anyone who has taken one of Murtha’s classes . . .
You’ll remember from Week 4, when we talked about search, that we must be mindful always of the fact that search engines are machines that know only what we tell them. They don’t easily clean up after us when our cataloguing is messy and inconsistent, or when cataloguing that is internally consistent meets an aggregated environment in which another collection is catalogued according to a different standard. Here’s a simple example from the RLG Cultural Materials database, a (now defunct) resource that aggregated collections content from many different cultural heritage collections. A search on the keyword “theater” returned 1486 results. Which seems like a good result, except . . .
. . . that a search using the alternate, British, spelling, “theatre,” returned 3037 results. Apparently, the two terms, linked by meaning but not spelling, were not understood by the search engine as equivalents.
Here’s another example. Searching for “Giambologna,” the Renaissance sculptor, in the Los Angeles County Museum of Art’s integrated library and museum collections returns seven records from LACMA library resources. Fine, except that you might not have known that Giambologna was also known as Giovanni da Bologna, and is catalogued as such by the LACMA curatorial and registration staff.
They have one artwork catalogued as Giovanni da Bologna, not found if you search on Giambologna.
And the library itself is inconsistent in its cataloguing. Here are a stray two library records, found by searching on “Giovanni Bologna,” but not “Giambologna,” or “Giovanni da Bologna.” It does make you wonder, doesn’t it, how often you have done a search and found a record without realizing that you had found only a fraction of the material that might have satisfied your query?
. . . because here are a few more library records, these catalogued under “Bologne, Jean de”
What could a cataloguer have done to prevent this end user problem? Historically, the problem was solved by including all variations of a term in the cataloguing record, an expensive and difficult-to-manage solution. A better solution, where there are available resources, is to apply vocabularies, thesauri, or local authority files to data content to create relationships between terms that are either synonymous (such as Giambologna and Giovanni da Bologna), or related--we’ll talk about this in a moment. Here’s the “Giambologna” entry from a well-known vocabulary resource, the Getty’s Union List of Artists’ Names (ULAN), which links all known name variations for the artist, and designates one, Giambologna, as preferred.
Applying the vocabulary to *either* the collections database or to the search engine could have created a relationship between all forms of the name that would have resulted in a search for one term finding records containing any name variation.
But let’s go back to some basics. You’ll want to feel comfortable with definitions for some basic terms about vocabularies. You’ll remember from last week that data values or data content are terms for the information that populates a field in a database. The term vocabulary means exactly what you have come to understand: it’s a list of terms, usually of a particular type or relating to a specific discipline. Sometimes, and this can be confusing, cataloguers use the term “vocabulary” interchangeably with “controlled vocabulary,” which is a limited list of terms which is vetted and approved. A controlled vocabulary assures that a term is used in a consistent way within a cataloguing system. A thesaurus is a kind of structured vocabulary in which terms are arranged in a way that describes relationships between them. The relationships can include synonymy or hierarchy, demonstrating relationships of equivalency or of broader and narrower meaning. You’ll find additional useful definitions in the glossary for “Introduction to Art Image Access.”
We have seen that end-users tend to employ broader terms than cataloguers. Because we do not expect user to learn to search using the narrower terms preferred by cataloguers, content providers must try to anticipate the terms that end-users use for searching and apply tools that will relate narrower terms with those used by users. So how exactly does this work? Without getting into the technical details, vocabularies and thesauri can be implemented by either content providers (usually cataloguers), or by content distributors (aggregators, web publishers, search engineers). Content providers use vocabularies and thesauri to select and limit cataloguing terms via controlled lists and look-up tables (now electronic, but until recently handled via paper dictionaries) or term expansion tools that allow a single term to be input and “exploded” to populate a database with many equivalent, narrower, and/or broader terms. Validation routines can be applied within the cataloguing environment to verify that data content for designated fields conforms to designated rules, such as matching a particular vocabulary. At the same time, content distributors employ several methods to ensure that records in their resources are findable. They may publish data content and format standards and verify that contributed content adheres to their standards; they may develop routines for normalizing or enhancing data so that it matches their resource standard, or they may use vocabularies to map terms between records, to expand user queries to match a larger set of terms, or to assist users with vocabulary-assisted querying (“did you mean Giambologna?”).
The best way for you to understand vocabularies and their application is for you to actually spend some time looking at them and playing with them. Take a record from the collection that you’ve designated for cataloguing. Select a field from your record and look up the data value in an appropriate controlled vocabulary, say, the Thesaurus of Geographic Names (TGN) if it’s a place name, or the Art and Architecture Thesaurus (AAT) if it’s a material- or technique- or object-type term. Look at the related terms, both the synonymous ones and the broader and narrower terms in the hierarchy. Are they relevant to the work in your catalogue record? Are they terms that another user might use for searching? If so, you may have identified a vocabulary that you may want to consider applying. There are, of course, literally hundreds of vocabularies that may be useful or relevant for you, from vocabularies that establish the formal abbreviations for place names issued by the U.S. government, to scientific taxonomies of botanical and species names. Some of these vocabularies are simple lists, others are thesauri containing synonyms and closely-related terms, still others are formal hierarchies that define relationships of broader and more restricted terms. It is the cataloguer’s job to determine which vocabularies are most appropriate to your cataloguing and access needs.
And sometimes, no vocabulary that actually exists will meet your needs. In these cases, you may choose to develop local authority files or controlled vocabularies. Such local tools can be as simple as pick lists attached to individual fields in your database, to thesaurus-building tools that allow you to create a viewable taxonomy of related terms contributed by staff cataloguers. Do you remember this example from Week 4? The thesaural relationship of creation dates 1701-1800 to the unrecorded search term “18th Century” could well have been part of a locally-developed thesaurus. And as we discussed in our examination of search, such locally-created tools will often contain useful relationships of terms that might not be present in a formal controlled vocabulary, including misspellings, foreign spellings, and “related objects.”
Indeed, you’ll find that de facto standardized vocabularies are emerging everywhere, an idea that to some might seem sacrilege, but to others is merely practical. I’m a big fan of Freebase, an open-access, open-content database of all sorts of linked, structured data. Freebase is only now beginning to have a large body of art-related content, but even today, if you look for artworks in the database, and sort on “Art Form,” a field that encompasses technique and object type, you’ll find an emerging vocabulary that may be more meaningful to users of Freebase than any other structured or controlled vocabulary. What do you think?
We’ll start this week’s readings with an interview from Cabinet Magazine with Murtha Baca and Erin Coburn. This 2004 interview may represent the first and last time that working cataloguers were featured in a popular magazine, and it’s a good introduction to standards from the point of view of some longtime museum professionals. I’m also asking you to take another look at some of the material from “Introduction to Art Image Access,” particularly Patricia Harpring’s chapter on “The Language of Images,” in addition to the glossary and list of tools. I feel certain that you’ll find that you’ll understand the material much better six weeks into the class, and recommend that you look closely at the sections on Authorities, Hierarchical Relationships, and Authorities.
As you have no written assignment this week, I’m hoping that you’ll use the opportunity to engage actively in discussion with your classmates and instructors about the vocabulary tools that might be most appropriate for your collections. Some of you have already demonstrated a command of the topic in the work of previous weeks: this is an opportunity for you to work with your classmates to help them understand and choose the most appropriate tools for their collections. For those of you who are working with collections that are already well catalogued by your institution, bring a critical eye to the choice of vocabulary tools, and think about where the choices that have been made might be questioned or even overturned. Try to consider the question of resource allocation in addition to appropriateness of the tools, understanding that--despite the advantages of standards in cataloguing, the cost to apply standards can sometimes be high.
And don’t forget to study for the quiz. Our weekly assignments, until now, have largely tested your ease with the *ideas* we have discussed in class. The quiz, one of only two you’ll have this term, is intended to allow you to demonstrate your mastery of the *facts* you’ll have encountered in discussions and readings. The quiz itself is worth 5 points (of the course total of 100). GOOD LUCK! And have a great Spring Break.
Jhu Week 7
Cataloguing Museum Collections History, Trends, and Issues Susan Chun JHU Museum Studies Spring 2010
Week 7: Vocabularies <ul><li>Why vocabularies? </li></ul><ul><li>Some definitions </li></ul><ul><li>Key vocabularies in the discipline </li></ul><ul><li>Locally-created resources </li></ul>
Definitions <ul><li>Data values or data content </li></ul><ul><li>Controlled vocabulary </li></ul><ul><li>Thesaurus </li></ul><ul><li>Hierarchy - narrower/broader terms </li></ul>
How does it work? <ul><li>Content providers </li></ul><ul><li>Content distributors </li></ul>
Vocabularies to review <ul><li>Getty Vocabularies (TGN, ULAN, AAT): http://www.getty.edu/research/conducting_research/vocabularies/ </li></ul><ul><li>VIAF: http://www.oclc.org/research/activities/viaf/default.htm </li></ul><ul><li>ICONCLASS: http://www.iconclass.nl/about-iconclass/what-is-iconclass </li></ul><ul><li>Library of Congress Authorities (Subject Headings, Name Authorities, Keyword Authorities): http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First </li></ul><ul><ul><li>(or any of the resources listed in the “Introduction to Art Image Access” Annotated List of Tools) </li></ul></ul>
Readings <ul><li>READ: </li></ul><ul><li>1. Meltzer, E. and Meltzer, J. (2004). Data and Metadata: An Interview with Murtha Baca and Erin Coburn. Cabinet, Issue 18, 37-40. [eReserves] </li></ul><ul><li>2. Cameron, F. (2009). Museum Collections, Documentation, and Shifting Knowledge Paradigms. In Parry, R., ed., Museums in a Digital Age (pp. 80-95). London: Routledge. [eReserves] </li></ul><ul><li>RE-READ: </li></ul><ul><li>1. Baca, M., ed. (2002). Introduction to Art Image Access. Los Angeles, CA: Getty Research Institute, http://www.getty.edu/research/conducting_research/standards/intro_aia/intro.html. Re-read Patricia Harpring's chapter on "The Language of Images," as well as the Glossary and the Annotated List of Tools. </li></ul>
Discussions <ul><li>Which vocabulary tool(s) would you apply to the collection you chose last week for cataloguing, and why? What benefit comes from the selection of a particular tool? Does it support better searching? Better indexing? More consistent clustering of results for browsing or display? </li></ul><ul><li>What other considerations come into play when considering a vocabulary in your cataloguing strategy? </li></ul>