Tagging MPLP: A Comparison of Novice & Expert
Domain User Generated Tags in a Minimally
Processed Digital Photographic Arc...
Introduction/Background
• Howard Zinn
• The postmodern archives
• Rising backlog problem
• Minimal processing/MPLP
• Minim...
Study Focus
• Supplemental metadata from social tags
• User prior domain knowledge as quality
control
• Research questions...
Methodology
• Mixed methods, quasi-experimental two-
group design
• 60 participants (novice & experts) generate
tags for 1...
Sample Collection: Groppi Papers
http://collections.lib.uwm.edu/cdm/landingpage/collection/march
Participants
• Scoring:
– Expert x= 7.57
– Novice x= 2.77
• Ages: 18-63, x= 31.73
• Gender (M/F/O):
23.3%/75%/1.7%
Race Fr...
Participants’ Prior Use/Knowledge
Coding Scheme
• Replication of metadata
• Format focused
• General identification
• Specific identification
• Description
...
Results: Number of Tags
Total Unique Min Max x
Expert 1705 396 15 196 56.83
Novice 2142 291 15 577 71.4
Combined 3847 396 ...
Results: Types of Tags
Replication Format Gen ID Spec ID Description Broader Emotion
Expert 17.54% 0.00% 20.12% 11.79% 31....
Results: Matching Metadata
% Matching % Non-matching
Expert 34.17% 65.83%
Novice 25.18% 74.82%
Combined 36.69% 63.31%
Results: Matching Queries
Match
Non-
match % Match
% Non-
match
% of Q.T.
matching
Tags
Expert 248 97 71.88% 28.12% 0.58%
...
Results: Tagging Motivation
How I would
find the item
How others
would find the
item
The content of
the item
The item’s
fo...
Results: Motivating Taggers
Account & login Newsletter/website
recognition
Social media
recognition
Non-monetary
rewards
A...
Conclusion/Future Directions
• Replication of presented metadata
• Benefits of domain expert tagging
• Benefits of includi...
Thanks for listening!
Please hold your questions for later
Upcoming SlideShare
Loading in...5
×

VRA2014 Collaboration in archives and special collections, Benoit

273

Published on

Presented by Edward Benoit III at the Annual Conference of the Visual Resources Association, March 12-15, 2014 in Milwaukee, Wisconsin.

Session #10: Case Studies in Collaboration within Archival and Special Collection Environments

MODERATOR: Amanda Grace Sikarskie, Western Michigan University
PRESENTERS:
• Edward Benoit III, University of Wisconsin-Milwaukee
• Jim Cunningham, Illinois State University
• Emily Shaw, University of Iowa
• Amanda Grace Sikarskie, Western Michigan University
Each of the presentations in this session tells a story of collaborations between archivists or special collections librarians and content area scholars. While the content of these speakers’ projects differs greatly—from circus-related images to quilt and embroidery programs on public television to the conceptual art of the Fluxus group—each project benefited from a team approach that made use of various skill sets. Both Jim Cunningham and Amanda Sikarskie worked on digitization projects of collections for which metadata (which was collected in the mid-twentieth century) were initially incomplete, outdated, or just plain inaccurate, prompting partnerships between archivists and content experts at outside institutions. Edward Benoit III’s minimal processing project, on the other hand, dealt with a variety of collections and content areas. It ultimately led to a similar outcome, however, solving the problem of minimal metadata by inviting scholars to participate in social tagging of the collections. Finally, Emily Shaw’s work with the digitization of the Fluxus West collection at the University of Iowa tells the story of forging new relationships through interdepartmental collaboration within a large research university. Please join us for this dynamic session that will be of interest to archivists, librarians, and content experts alike.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
273
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Howard Zinn infamously caused quite a stir in the 1970s through his lambasting of archivists’ reinforcing the status quo and social control of the politically elite. Zinn called on archivists to “take the trouble to compile a whole new world of documentary material, about the lives, desires, needs, of ordinary people,” and, “to begin to play some small part in the creation of a real democracy” (Zinn, 1977, http://www.libr.org/progarchs/documents/Zinn_Speech_MwA_1977.html) Zinn’s comments, along with others, initiated the post-modern movement in archives, and a concerted effort to increase the breadth of voices included within the archives. The addition of a wide-range of new materials, combined with hiring stagnation directly led to a massive backlog problem during the past twenty years, to the extent that some archives housed more unprocessed (and therefore inaccessible) collections than processed ones. In response, Greene and Meissner (2005) proposed a drastic shift in archival practice toward the concept of “More Product, Less Process” or MPLP, and minimal processing. Minimal processing expanded throughout archival practice including digital archives from its origins within arrangement and description resulting in an increase of available collections both physically and digitally. The minimal processing technique in digital archives prioritizes the collection as a whole over individual items, specifically regarding metadata. The online collections provide only minimal metadata, typically at the series or folder level. The MPLP approach deviates from contemporary practice which describes digital archival materials at the item or record level. For example, each letter in a traditionally processed folder of digitized correspondence includes individualized descriptive metadata. The MPLP version of the same collection would only describe the folder as an aggregate with individual letters sharing duplicate metadata. While this replicates the experience of researchers in the physical archives, studies demonstrate an increasing demand for more description and access points from online users.
  • Reaching out to the same users for assistance and requesting them to help supplement minimally processed digital archives’ metadata through creation of tags could address this issue. However, social tagging without some measure of control could generate too many useless terms, thereby hindering access rather than increasing it. Additionally, archival users previously stated a preference for user-generated content control mechanisms. While some suggest digital archivists could simply approve/disapprove each tag, such a system requires too much oversight. I propose categorizing the users rather than the tags. Specifically, permitting users who are subject area experts (hereafter referred to as expert users) to tag the collections. The expert users provide more reliable tags, meeting the needs of institutions and increasing access to the collections. Additionally, the inclusion of user-generated tags embraces the ideals of the post-modern movement through encouraging community participation and increasing the voices heard within the archival description process. Studies have shown users increasingly demand immediate, online access to archival materials with detailed descriptions (access points). The high costs of creating and maintaining digital archives precluded many archives from providing users with digital content or increasing the amount of digitized materials. The adoption of minimal processing theory to digital archives limits the access points at the folder or series level rather than the item-level description users’ desire. User-generated content, such as tags, could supplement the minimally processed metadata, though users are reluctant to trust or use unmediated tags. This project explores the potential for controlling/mediating the supplemental metadata from user-generated tags through inclusion of only expert domain user generated tags.
  • Ages: Experts: 19-63, x= 35.1;Novice: 18-60, x= 28.366Race: Note, participants could select more than one
  • Notes: Based on self-assessment on Visual Analog Scale (VAS). Expert users assessed higher on prior use of archives, knowledge of social tagging, and use of social tagging.
  • Examples of the coding scheme tags from this particular image:Replication of metadata: Groppi, Father Groppi, photographFormat focused: black and white, black and white photographyGeneral identification: big man, police, riot gear, wagonSpecific identification: Wagon 722, 1967, Milwaukee policeDescription: Arrested, detained priest, inside police vehicleBroader context: Catholic social action, civil rights movement, raceEmotional: unjust, acceptance
  • Note: While Novice users appear to generate more tags on average, the one novice participant who created 577 tags skews the average. Removing this user who result in an average of 53.97 tags per novice user (less than the expert average).
  • 42,755 total Query terms
  • Note: Some additional considerations while creating tags listed in open-ended questions include:Visual cues (e.g., American flag)Began with most obvious, but became more specific as time progressed“I also thought of how I personally would like Fr. Groppi to be remembered by posterity”Leaving out what data was already provided (multiple mentions)Not inferring information that is not present
  • Archive requires you to create a user account and login to submit tagsArchive offers recognition for tagging in newsletter or websiteArchive recognizes top taggers through social media (Facebook, Twitter, etc.)Archive provides non-monetary rewards for tagging (research assistance, archive tour, etc.)Archive allows you to anonymously submit tagsArchive provides monetary rewards to tagging (photographic prints, photocopies, discounted or free membership, etc.)Note: Some other methods for motivating tag generation listed in open-ended questions:Allowing others to validate your tagsAs more people benefit from tags, they will begin tagging moreMaking tagging/commenting options more prevalent on websitesInstitutions marketing directly to usersSpecial events for taggers
  • VRA2014 Collaboration in archives and special collections, Benoit

    1. 1. Tagging MPLP: A Comparison of Novice & Expert Domain User Generated Tags in a Minimally Processed Digital Photographic Archive Edward Benoit, III School of Information Studies, UW-Milwaukee
    2. 2. Introduction/Background • Howard Zinn • The postmodern archives • Rising backlog problem • Minimal processing/MPLP • Minimally processed digital archives
    3. 3. Study Focus • Supplemental metadata from social tags • User prior domain knowledge as quality control • Research questions: – What are the similarities/differences between tags generated by expert and novices? – In what ways do tags generated by expert/novice users correspond with full metadata? – In what ways do tags generated by expert/novice users correspond with existing users’ query terms?
    4. 4. Methodology • Mixed methods, quasi-experimental two- group design • 60 participants (novice & experts) generate tags for 15 photographs & 15 documents • Pre- and post-questionnaires • Analysis: – Open coding – Descriptive statistics
    5. 5. Sample Collection: Groppi Papers http://collections.lib.uwm.edu/cdm/landingpage/collection/march
    6. 6. Participants • Scoring: – Expert x= 7.57 – Novice x= 2.77 • Ages: 18-63, x= 31.73 • Gender (M/F/O): 23.3%/75%/1.7% Race Frequency % of Participants White 44 73.3% Black 9 15.0% Hispanic/Latino 10 16.7% American Indian 4 6.7% Asian/Indian 2 3.3% Pacific Islander 0 0.0% Other 1 1.7% • 48.3% from WI or IL • 58.3% non-students
    7. 7. Participants’ Prior Use/Knowledge
    8. 8. Coding Scheme • Replication of metadata • Format focused • General identification • Specific identification • Description • Broader context • Emotional Wisconsin Historical Society, WHS-26541 • Image removed for copyright. Accessible at: http://www.wisconsinhi story.org/whi/fullRecor d.asp?id=26541
    9. 9. Results: Number of Tags Total Unique Min Max x Expert 1705 396 15 196 56.83 Novice 2142 291 15 577 71.4 Combined 3847 396 15 577 64.12
    10. 10. Results: Types of Tags Replication Format Gen ID Spec ID Description Broader Emotion Expert 17.54% 0.00% 20.12% 11.79% 31.91% 17.13% 1.52% Novice 14.01% 3.08% 29.43% 12.42% 24.01% 15.74% 1.31%
    11. 11. Results: Matching Metadata % Matching % Non-matching Expert 34.17% 65.83% Novice 25.18% 74.82% Combined 36.69% 63.31%
    12. 12. Results: Matching Queries Match Non- match % Match % Non- match % of Q.T. matching Tags Expert 248 97 71.88% 28.12% 0.58% Novice 184 69 72.73% 27.27% 0.43% Combined 312 147 67.97% 32.03% 0.73% • Query log analysis for one month on existing collection resulted in 42,755 unique query terms
    13. 13. Results: Tagging Motivation How I would find the item How others would find the item The content of the item The item’s format The connection between items The accuracy of the provided information The previous user’s tags My previous tags Expert 4.27 4.10 4.50 3.33 3.43 3.50 3.63 3.87 Novice 4.60 4.60 4.67 3.23 3.63 3.70 3.90 4.10 Combined 4.43 4.35 4.58 3.28 3.53 3.60 3.77 3.98
    14. 14. Results: Motivating Taggers Account & login Newsletter/website recognition Social media recognition Non-monetary rewards Anonymously submission Monetary rewards Expert 3.67 3.30 2.97 3.67 3.87 4.40 Novice 3.07 3.23 2.97 3.83 3.90 4.40 Combined 3.37 3.27 2.97 3.75 3.88 4.40
    15. 15. Conclusion/Future Directions • Replication of presented metadata • Benefits of domain expert tagging • Benefits of including both domain expert and novice tags • Further study needed on: – Alternative factors – How to motive tag generation
    16. 16. Thanks for listening! Please hold your questions for later
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×