Welcome back to Cataloguing Museum Collections. Michael and I were delighted to see that last week you engaged in a spirited discussion of questions of cultural patrimony and documentation. It’s good that you’re beginning to make connections between cataloguing and a museum’s values, and to think about the fact that there are no absolutely right choices about what we catalogue, what part of our cataloguing is published, and how that reflects an museum’s priorities and values. As we attack this week’s topic, you’ll begin to see a few more pieces of the complicated cataloguing puzzle and think about more of the choices that museums must make in order to make their collections accessible to users. In a way, the subject of search anticipates a number of topics we’ll address in depth later in the course--including vocabularies, standards, publishing and distribution of collection material.
I suppose it goes without saying that internet searching has taken on an increasing importance in our online lives as we contend with finding our ways around vast amounts of information on the internet. For museums that have in the past decade pushed millions of collection records to their own websites as well as to aggregated resources (such as ARTstor or Flickr), making their collections easily findable through search is the key to justifying the cost and resources allocated to digitizing, cataloguing, and distributing objects online via electronic collection records. And yet, I’m frequently surprised to discover that in many museums, there is scant connection between the people selecting, specifying, and implementing website search tools, and those developing the collection records that will be searched. This week, we’ll think a little about the evolution of searching, and its relationship to browsing. We’ll try to imagine the ways in which searching and cataloguing are related, and how decisions we make about cataloguing are reflected in the success (or failure) of searches made by users. We’ll think ahead to our discussions about some of the ways in which both searches and catalogue content can be processed by applying vocabularies, thesauri, or weighting methods, and have a look at some next-generation searching tools that may be particularly interesting to museums.
Until the advent of computerized collections management systems and the internet, collections records were kept by most museums in card catalogues that looked and functioned much like the ones in libraries. The cards, which supported the documentation and study of collections, were organized for users who were presumed to have familiarity with the cataloguing conventions of the museum. Their structure was a clue to the nature and priorities of the organization: the cards were usually indexed by the access point(s) most important to the museum’s internal users, often by accession number, sometimes by creator name or object type. Michael discussed this in last week’s presentation about the history of cataloguing, but it bears keeping in mind, because early museum collections on the web made the mistake of trying to replicate the museum’s card catalogues online, presuming that users would use advanced search functionality that required them to understand the specific nature of the data entered into each field to create multi-criteria searching, or that they would use hierarchical browsing, which in many ways mimics the card catalogue structure, to navigate to the work or works that they wished to find. The problem with these assumptions was that many museum visitors are unfamiliar with even the most basic elements of our taxonomy, requiring coaching to understand either our advanced search fields, or to enter useful search terms into those fields when searching. At the Metropolitan Museum of Art, we produced a whole page on “How to Read a Caption” for the museum’s “Timeline of Art History” that is intended to provide novice users with an understanding of museum cataloguing and classification. We hoped that this text would help users search more successfully, and to better understand the collection records that their searches yielded, though, of course, we had little hope that more than a tiny fraction of visitors to the site would read these instructions. As it turns out, many internet users ignore advanced search functionality, and even the extensive browsing interfaces built by museums for users who don’t care to search. (Keep in mind, too, that browsing is a close relative of searching and that much browsing functionality is, in reality, built on pre-populated searches of categories deemed useful to visitors.) As you’ll learn in this week’s readings, the simple--or keyword--search box is the one that most users turn to, in a trend that--in the age of Google--seems to be gaining. Keyword searches, which simply rely on comparing a search term entered by a user with a search tool’s index of most or all catalogue content (along with other texts or metadata associated with a web page), can be made by users with little sense of any standards that may restrict or define a museum’s catalogue, and this lack of familiarity with our cataloguing practice can lead to frustration and failed searches. The implication for these trends in infoseeking behavior is that as we prepare our cataloguing for online discovery, it must be flexible enough to serve both the expert user who will use precise and domain-specific terms within our own cataloguing structure, and the casual user who may use search terms that either do not exist in our cataloguing, or that--while having a family relationship to our catalogue terms--do not match them.
I often engage, when visiting a museum’s online collection, in playing a game in which I try to guess which fields in the organization’s collections management system have been indexed for searching. It’s not always obvious, so you have to poke around a little in order to try to figure things out. The difficulty increases when museums publish fields that are not indexed (often these are the fields in a collections management system that we call “display fields”) or index fields that aren’t published (“search fields”). A search that I’ll often do on sites with Asian art collections is for works from the Edo (or Tokugawa) period, a period with dates that are largely undisputed (1603-1868 C.E.). Recently, I looked for Edo period art on the site of the Asia Society, in New York. By browsing to the collection’s Japanese art using their map interface, I learned that the museum’s online collection contains a beautiful scroll by Honami Koetsu, an important Edo period calligrapher. And yet, my search on “Edo” returned no results. My assumption, then, would be that the Asia Society’s collection cataloguing probably contained the scroll’s actual creation date, without reference to the period, or that the Edo period dates were given, but not the exact term, or that the alternate period name, Tokugawa, was in the catalogue record. But a check of the caption on the object page--which I presume matches the internal catalogue record--indicates that the Edo period is named, the start and end dates (search fields) for the period are provided, the exact date of the work is given, as are the artist’s birth and death dates. It appears, in this case, that the people who specified the search functionality for the Asia Society chose to index *only* the creation place (Japan), privileging that information over creator, object type, and date. Here’s an example of a search that fails, despite abundant cataloguing in both display and search fields.
More often, a museum’s records don’t contain all of the access points to accommodate the range of searches that might be performed by users. In these cases, a thorough assessment of the catalogue records can help website developers to build tools that will steer information seekers to the data points that are best represented in the museum’s cataloguing. In this example of one kind of collections search offered on the newly-launched Indianapolis Museum of Art website, users are prompted to select a date range by using a creation date slider, and to then filter results by creator or object type. I used the creation date slider to select the year 1700, sometime in the middle of the Edo period, and found this Hiroshige print. An examination of the displayed catalogue information indicates that Hiroshige’s birth date is late in the eighteenth century, and suggests that the work was probably created some time in the 1800’s, so the search result returning this work against a query for a creation date of 1700 was probably based not on the artist’s dates, or on a specific creation date, but on the association of the Edo period dates (not shown here, but almost certainly associated with hidden “Period” field search dates). Understanding the specific implications of which fields we select for indexing in search and the value or quality of the content in those fields--relative to search terms our users will know and use--will have a significant effect on the end user’s experience of our online collections.
But there’s the rub. Making our online collections visible to searchers isn’t the same as putting our records online. And turning our analog catalogue records into electronic records isn’t enough either. Significantly *more* information is needed to serve the diverse general audience than to provide access to our own internal users. And to use human resources to make our collection records really valuable to the vast general public--not just on our own sites by in places where our records are aggregated, scraped, and mashed up via collection APIs--would require funds beyond the budget of even the wealthiest institutions. So: in cataloguing for access--which for this week means cataloguing to be “seen” by search engines--museums need to make good choices about how to deploy their cataloguing resources. They need to be aware of automated tools for enhancing cataloguing (by, for example, populating hidden search fields with dates calculated from creator or creation dates); they should think about how to interpret the infoseeking behavior of their visitors by examining search logs to understand which access points are most used by online users; and they might try to educate users to understand the complicated and arcane systems that we use to describe our collections, making them better searchers.
And what about Google? I’ve deliberately avoided talking about how museum records are found by the big search engines, including Google, because the subject, including the entire field of search engine optimization (SEO), is a complicated and sprawling one, affected by a number of factors beyond the nature of the catalogue. Nonetheless, at the core of the issue is this fact: good and consistent cataloguing, well understood by the team of professionals who handle the production of your web pages, will be an element in optimizing your collections content for finding by Google and other search engines. If you’re interested in looking further at the specifics of web search engines, keep an eye on the official Google blog, which provides regular updates on work at Google and is often a useful resource for understanding exactly how Google works. Seb Chan publishes regularly on web analytics and search engine optimization in his “Fresh and New(er)” blog. And, for those for whom that’s not enough, I’m more than happy to provide a bibliography of academic papers treating web search engines.
One key element of the search experience is the way in which search results are displayed and sorted, something which becomes increasingly important as the size of online collections grows. Early online museum collections sometimes returned scores of results from a search, sorted in a way that was predetermined by the web developers but that often appeared random or bizarre to end users. (The Metropolitan Museum returned results in the collection search sorted by a standard specified by each individual collecting department. Results for some departments were sorted by creation date, for some by creation place or creator name, and for the European Paintings department, alphabetically by “school”--I.e. “Netherlandish 15th-16th Century,” “Flemish 17th-18th Century,” French, 15th-16th Century.”) Today, most websites give users the option to display, sort, and even filter, results based on their own preferences. The consistency, quality, and nature of the cataloguing content is critical to enabling successful display and sort functions for online collections, since in sorting on any field, results with disparate field content and cataloguing standards need to be ordered. This means that sorting on “creator” is best served by creator name fields in which the creator’s surname is easily identifiable, or in which a “sort” field, which identifies the text for sorting, is populated. Date fields that contain both period names and actual dates need somehow to be interpreted for sorting, and so on. Incomplete cataloguing results in “orphan” results that cannot be easily placed in sort order, which explains why many museums limit the fields on which search results can be sorted to those for which they have comprehensive cataloguing data.
Okay. We understand that a simple search works by matching a search term and an index term (“Rembrandt” = “Rembrandt”). But what do we do about users who stubbornly refuse to use the exact terms contained in our catalogue records to find works in our collections, or who misspell terms, or speak foreign languages? We are helped here by a couple of key tools that enable a kind of dialogue between searchers and search tools. Take, for example, the search, above, for “18th Century.” In the “dialogue” that takes place between the searcher choosing the term, “18th Century” and the system returning this Boucher work, the search engine has consulted a local thesaurus, that has created a relationship between the term “18th Century” and the dates “1701-1800,” allowing the result to be returned, despite the fact that the term itself doesn’t match anything in the catalogue record. The application of vocabularies and thesauri outside of the cataloguing system, usually by the search tool, is a simple way to vastly enhance search results, and can work with both locally-created thesauri, or discipline-recognized vocabularies, such as the Getty Art and Architecture Thesaurus (AAT), the Thesaurus of Geographic Names (TGN), or the Union List of Artists’ Names (ULAN). Vocabularies with a hierarchical structure allow searches to be expanded, as well, to include works matching broader or narrower concepts in a hierarchy. For example, a searcher looking for the Boucher work shown here might have searched on “French textiles” without finding the work, which in the original record from the Minneapolis Institute of Arts is catalogued as a “Wool, silk; tapestry weave.” The application of the hierarchical Art and Architecture thesaurus to the search, which relates textile and tapestry in a hierarchy of types, allows the work to be offered as a possible search result. Museums do lag somewhat behind the commercial sector in developing and/or applying thesaurus and vocabulary tools to their site searches, due, perhaps to the difficulty of creating and maintaining local thesauri, and the cost of licensing domain-specific ones. Commercial interests (Amazon.com, for example) as well as internet search engines such as Google, rely on thesauri not just to support searching of alternative descriptions and names, but also misspellings, foreign spellings, and “similar objects.” Museums should look hard at the powerful search tools developed by people whose bottom line depends on making search work.
Still, no matter how hard we work at improving our cataloguing, and at building tools that enhance the likelihood of catalogue records matching term-based searches, traditional search--and particularly words--sometimes fail. Which brings us to some newfangled tools for searching that have tremendous promise for access to museum collections. Rather than spending a lot of time describing these new tools here, I’m asking you--as part of this week’s work--to explore just a few of these new search tools. Many of them use various elements of visual image recognition or visual content mining to support searching for color, shape, or composition. Others combine text and tag search with visual image searching, or use next-generation semantic tools to automatically draw connections between catalogued or tagged objects. Other tools that you may want to explore on your own use geolocation information--latitude and longitude documentation--to allow spatial or map-based finding. And one of the hottest and fastest-growing areas of search research has been labeled “social search,” which combines traditional searching with the (presumed) power of a network to personalize and enhance search results by drawing on user characteristics, preferences, history, or explicit relationships, to weight search results.
The readings this week range from the general (Esther Hargittai’s broad introduction to searching) to the specific (Jennifer Trant’s study of search logs for the Guggenheim Museum) to the anecdotal (Charlie Moad’s blog post about developing the collection search for the new Indianapolis Museum of Art website). These three readings are supplemented by some technically difficult, but rewarding, reading on the search behaviors that will help you to understand searching from the user’s point of view. In addition to these readings, I hope that you’ll think back to Murtha Baca’s two books, particularly to Chris Sundt’s chapter, “The Image User and the Search for Images” in “Introduction to Art Image Access” as you approach this week’s assignments. For your written assignment, I’m asking you to compare the search experience on two different museum, library, or commercial websites. Ideally, you’ll describe one search experience that you feel is successful, and one that, for some reason or another, fails. If possible, consider how the decisions that the website developers may have made about priorities, resource allocation, serving their particular user base affected the search experience, and how the cataloguing, display, and search tools reflect an organization’s mission and collections.
In the class discussion this week, I’d like you to think about your own search habits. As you work through the class readings, think about how what you know about how search tools work might affect your own infoseeking methods. And do a lot of searches! It’s the best way to get your mind around how searching and cataloguing are connected, and where our cataloguing can use some extra help to be really useful to online visitors. In addition, I want to commend you for your good work on the class glossary in Week 2, but also to remind you that Michael and I hope that it will be a resource that you will continue to add to throughout the course. We’ll reward you for your ongoing contributions to the glossary by crediting you in the weekly class discussion grade. And keep in mind: you needn’t have a definition to contribute in order to help build the glossary. If you read or hear a term that confuses you or that you don’t know how to contextualize or define, contribute it to the glossary for one of your classmates (or instructors) to gloss. If you think you have an alternate definition for a term that is already in the glossary, please add it. We’re particularly interested in surfacing areas of confusion or conflict around these definitions, which--as Michael has pointed out--are not as rigidly defined or highly standardized as some in our field might have you believe. We genuinely believe that a confident command of these terms is one of the real hallmarks of a cataloguing professional. So: see you online. Have a good week.
Jhu Week 4
Cataloguing Museum Collections History, Trends, and Issues Susan Chun JHU Museum Studies Spring 2010
Week 4: Search <ul><li>Ways of finding: searching and browsing </li></ul><ul><li>Search and cataloguing </li></ul><ul><li>Search results: display and sort </li></ul><ul><li>Processing cataloguing: vocabularies and thesauri </li></ul><ul><li>Next-generation searching </li></ul>
How we find http://www.metmuseum.org/toah/works-of-art/
Reading and Assignment <ul><li>Hargittai, E. (2007). The Social, Political, Economic, and Cultural Dimensions of Search Engines: An Introduction. Journal of Computer-Mediated Communication 12(3): 1 http://jcmc.indiana.edu/vol12/issue3/hargittai.html </li></ul><ul><li>Trant, J. (2006). Understanding Searches of an On-line Contemporary Art Museum Catalogue. A Preliminary Study, http://conference.archimuse.com/system/files/trantSearchTermAnalysis061220a.pdf </li></ul><ul><li>Moad, C. (2010). What's in a Web Site: Collections Search, http://www.imamuseum.org/blog/2010/02/09/whats-in-a-web-site-collections-search </li></ul><ul><li>Matusiak, K. K. (2006). Information Seeking Behavior in Digital Image Collections: A Cognitive Approach, The Journal of Academic Librarianship, 32(5), 479-488 (e-reserves) </li></ul><ul><li>Hsieh-Yee, I. (2001). Research on Web search behavior. Library & Information Science Research, 23(2), 167-185 (e-reserves) </li></ul><ul><li>Prepare a 1000-word comparison of the search interfaces and results from two different sites, one that you feel uses search to successfully and appropriately enable access to content, and another that fails in some way to do so. Consider the nature of the content being searched and the cataloguing--both public and hidden--that must be produced and indexed in order to execute the search. Think about the alternative searching tools we've considered this week: might some or all of them improve the search experience or results? Compare the ways in which the search results are presented and think about the user's experience of interacting with the results. </li></ul>
Discussions <ul><li>How do you search? Look closely at your own habits for searching and try to imagine how cataloguing would best support your searches. Share a search that returned a result that surprised you. Did you get more, or fewer, or different results than you expected? Can you suggest ways in which different search tools might have helped you to achieve a different result? </li></ul><ul><li>Don’t forget the class glossary! </li></ul>