Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Marshall hm poster_vra2015

352 views

Published on

  • Be the first to comment

  • Be the first to like this

Marshall hm poster_vra2015

  1. 1. A Comparative Study of Cataloger- and User-assigned Subject Terms Hannah Marie Marshall, Metadata Librarian for Image Collections Cornell University Library 63% 31% 31% 25% 17% 22% 0% 5% 2% 12% 47% 45% Cataloger Control Group Variable Group Boy Athlete Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 83% 14% 5% 0% 2% 2% 17% 4% 5% 0% 80% 88% Cataloger Control Group Variable Group Chain Ornament And Coin Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 40% 36% 27% 0% 7% 17% 60% 14% 7% 0% 43% 49% Cataloger Control Group Variable Group Female Figurine Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 50% 51% 41% 0% 3% 7% 50% 24% 36% 0% 22% 16% Cataloger Control Group Variable Group Articulated Madí Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 72% 36% 35% 14% 26% 23% 14% 17% 17% 0% 21% 25% Cataloger Control Group Variable Group Adoration Of The Kings Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 0% 19% 9% 0% 1% 4% 33% 40% 31% 67% 40% 56% Cataloger Control Group Variable Group Large Black Olla With Flat Shoulder Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 100% 62% 57% 0% 5% 10% 0% 11% 17% 0% 22% 16% Cataloger Control Group Variable Group Neapolitan Night With Tarantella On The Sea Shore Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 71% 56% 40% 0% 5% 29% 29% 24% 14% 0% 15% 17% Cataloger Control Group Variable Group Portrait Of Lorenzo The Magnificent Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 38% 46% 32% 63% 20% 28% 0% 15% 18% 0% 19% 22% Cataloger Control Group Variable Group The Expulsion Of Quetzalcoatl Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms 78% 49% 40% 0% 3% 11% 22% 27% 30% 0% 21% 19% Cataloger Control Group Variable Group Verbena De Atocha Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms Design Data was collected using two Qualtrics surveys – one for the control group and one for the variable group. Each survey consisted of the same 10 images (below) drawn from the Cornell University Library’s images for teaching collection, the Knight Visual Resources Collection. Beneath each image were 9 blank fields into which participants were asked to enter subject terms describing the image above. The variable group included the addition of the following text: “When deciding on descriptive terms for each image, keep the following three questions in mind: 1. What is the image of? 2. What is the image about? 3. What is the image a good example of?” Recruitment & Participants The surveys were distributed randomly by email to all undergraduate students enrolled in a course in the departments of Art History and Classics. Undergraduate students in the College of Arts and Sciences are the user population about whose search behavior nothing was known. Students in Art History and Classics were specifically targeted to examine common assumptions about the relationship between subject expertise, research experience and information-seeking behavior. Presumably, this is a population that has some subject expertise but lacks the research experience of graduate and faculty users of the collections. January 2014 Initial proposal to the Institute for Research Design in Librarianship (IRDL) • Literature review • Methodology • Schedule June 16th-June 26th 2014 IRDL July – October 2014 • Refining study design and proposal; preparing to submit study for IRB Exemption; pre-testing October 5th 2014 IRB Exemption granted October 24th 2014 Data collection begins December 8th 2014 Data collection ends December 2014 – Present Analysis Spring 2015 – Continued data collection • Primary terms yield the greatest search utility and higher levels of successful image retrieval. The findings discussed in the analysis of non-subject terms indicate that many non-subject terms were applied based on primary level analysis, suggesting that the rate at which participants assigned primary terms does not fully reflect the degree to which they employ primary level subject analysis. In conjunction with the findings in Figure 2 demonstrating that the vast majority of corresponding literal terms were primary and the established understanding that literal correspondence is the best predictor of successful retrieval, it is clear that subject metadata addressing the primary level of meaning in images of artworks is most likely to have the highest search utility and levels of successful image retrieval. • High numbers of non-subject terms among the responses indicate that subject metadata is not the strongest access point for visual material • Priming participants using questions based on the different types of subject meaning did not dramatically effect the nature and content of their responses. Secondary level subject analysis was the most influenced by the priming questions Control Group Variable Group All Participants Respondents 39 41 80 Completion rate 31.7% 35.9% 33.8% Average #of terms assigned per image 2.33 2.57 2.45 Research Question # 1 What is the search utility of our images to our users? What is the level of correspondence between between cataloger- and user-assigned subject terms for images?  Based on evidence that description is a strong indicator of search behavior, this will lend insight into our users’ search behaviors and the search utility of our image collections  This was accomplished by comparing the literal terms assigned by each group and quantifying the frequency with which each group assigned the same term to the same image  The level of literal correspondence between the existing subject terms and those assigned by the respondents was only 8.5%  Of that 8.5%, 74% of the terms addressed the primary level of subject analysis, 3% were secondary terms, 16% were tertiary, and 6% were non-subject terms Research Question #3 By asking users to contemplate the primary, secondary and tertiary levels of subject meaning, can we improve the accessibility of our image collections? Does priming users with a set of questions about the images change the nature and content of their responses when asked to perform descriptive tasks? • This drove the design of a variable group who were asked to contemplate the following three questions as they performed the descriptive tasks in the survey: • What is the image of? • What is the image about? • What is the image a good example of? Methodology Literal Matches, 8.5% Non-Matches, 91.5% Literal correspondence between responses and existing metadata Literal Matches Non-matches Primary Terms, 74% Secondary Terms, 3% Tertiary Terms, 16% Non-Subject Terms, 6% Corresponding literal terms broken down by type Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms Conclusions 64% 34% 39% 30% 12% 13% 9% 15% 19% 16% 18% 16% 5% 37% 34% 39% Cataloger Respondents (all) Control Group Variable Group Comparison of all types of terms assigned by each participant group Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms Research Question # 2 What types of terms does each group assign, and in what ratios do they assign them? What is the level of correspondence in the type of subject analysis being performed by each group?  When the terms assigned by the respondents differed from the existing metadata (91.5% of the time), do they differ because each group is performing fundamentally different kinds of analysis or exercising dramatically different levels of interpretation?  The subject analysis of images has traditionally been understood as a graduated scale that identifies different levels of interpretation and analysis. To what degree does each group use them?  The definitions for each category were arrived at through a review of the literature on the subject analysis of images  Figure 5 presents the ratios of primary, secondary, tertiary, and non-subject terms assigned by each group  Figure 8 presents the same data adjusted to exclude non-subject terms  Figure 9 breaks the findings down by the type of image – images of 2D works of art and images of 3D works 39% 30% 9% 15% 18% 16% 34% 39% Control Group Variable Group Comparison of types of terms assigned by the control group and variable group Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms Timeline Research Question #1 In order to address Research Question #1, the literal terms provided by the respondents were compared to the existing metadata. When a respondent assigned a term that was a literal match to one of the existing terms, this correspondence was noted, and the overall rates of correspondence have been compared in Figure 1. The levels of literal correspondence were extremely low – 8.5%. The universe of corresponding literal terms was then analyzed by type (Figure 2), revealing that a significant majority, 75%, of corresponding literal terms are terms addressing primary subject matter, suggesting that the search utility of primary subject terms in image collections is extremely high. Research Question #2 To address Research Question #2, each term in the existing metadata and the survey responses was coded according to the different types of subject meaning defined above (primary, secondary, tertiary, and non-subject). This allowed for comparison between groups, as shown in Figure 5, which illustrates little difference between the control and variable groups, but significant differences between the cataloger- assigned terms and those assigned by the respondents. The existing metadata includes almost no non-subject terms and roughly twice the percentage of primary terms as the respondents’ metadata which, given the high number of primary terms among the literal matches discussed in the findings for Research Question #1, contributes positively to search utility. Analysis & Findings Primary – perception of the work’s pure form • “What is the image of?” / “What does the image include?” • Identifies figures and gestures ie. “man”, “pointing”, “clasped hands”, “inscription” Secondary – incorporates cultural and iconographic knowledge • “What is the image about?” • Interprets figures and gestures ie. “Christ”, “banishing”, “prayer” Tertiary – demonstrates an awareness of the work as a document of cultural activity that reflects a time and place • “What is the image a good example of?” / “How does the image communicate?” • Identifies devices ie. “symbolism, “abstraction”, “chiaroscuro” Non-Subject terms – descriptive terms addressing aspects of a work that are not related to its subject ie. worktype, creator, style/period, culture, materials/techniques, etc. Types of Subject Terms Research Question #3 The literal and coded responses for the control and variable groups were compared, as seen in Figures 3 & 4. Little or no meaningful difference is detectable between the groups, though the variable group assigned 10% more secondary subject terms than the control group. Non-Subject Terms Non-subject terms were further broken down by type, as seen in Figure 7 which reveals that, in most cases, participants assigned non-subject terms describing the worktype of a piece and the materials and techniques used in its creation. Figure 9 reflects the finding that non-subject terms were assigned to images of 3D works at twice the rate of 2D works, while primary terms were assigned to images of 2D works at nearly twice the rate for images of 3D works. This suggests that respondents were applying non-subject terms to images of 3D works in the same way that they applied primary terms to images of 2D works, thus revealing a failure to distinguish between the subject matter depicted in a 2D work and a 3D work depicted in a digital image. This suggests a poor grasp of the “work: image” relationship among the respondents. Viewed through the lens of this finding, non- subject terms can, in many cases, be functionally viewed as primary terms, indicating that the rates of primary level subject analysis are higher than reflected here. 10% 7% 90% 93% Control Group Variable Group Literal correspondence between control/variable groups and existing metadata Literal Matches Non-matches Works and Images The responses were broken down by image, and the literal and coded values were compared. Comparisons of the coded values for each image appear below. It is noteworthy that there does not seem to be a standard ratio applied to all images by any group – including the existing cataloger- supplied metadata. Correspondence between the control and variable groups is consistently high, while correspondence between these two groups and the existing metadata is generally low. The responses were also broken into those assigned to images of 2D works and images of 3D works to examine differences in the types of terms assigned to each group. The findings, shown in Figure 9. Reveal a high percentage of non-subject terms assigned to images of 3D works, suggesting that participants especially struggle in the subject analysis of 3D works, while subject analysis of 2D images comes more easily. The rate of secondary terms assigned to images of 2D works was double that for 3D works, and the rates of primary and non-subject terms for images of 2D works are approximately inverse to those applied to images of 3D works. This relationship suggests a poor grasp of the work/image relationship, as discussed in the previous section. Figure 1 Figure 2 Figures 3 Figure 5 Figure 6 0% 20% 40% 60% 80% 100% Cataloger Respondents (all) Control Group Variable Group Percentages of subject and non-subject terms assigned by each participant group Subject Terms Non-Subject Terms 0% 20% 40% 60% Condition Culture Materials/Techniques Value Worktype Types of non-subject terms assigned by respondents Figure 7 Figure 8 72% 54% 60% 50% 9% 20% 14% 25% 19% 26% 26% 25% Cataloger Respondents (all) Control Group Variable Group Comparison of subject terms assigned by each participant group Primary Terms Secondary Terms Tertiary Terms 71.70% 49.80% 40.80% 45.30% 0 47.20% 30.20% 22.60% 26.40% 15.30% 11.80% 20.20% 16% 0 5% 6% 10.40% 8.20% 13% 18.80% 19.20% 19% 0 32% 17.40% 16.20% 16.80% 0% 19.60% 19.80% 19.70% 0 15.80% 46.40% 50.80% 48.60% Types of terms assigned to images of 2D and 3D works by each participant group 2d 3d Primary Terms Secondary Terms Tertiary Terms Non-Subject Terms Figure 9 Figure 4

×