Capstone Report Presentation User-Generated Metadata in Social Software: An Analysis of Findability in Content Tagging and Recommender Systems Presented by Michael Barnes In partial fulfillment of the requirement for the degree of Master of Science. March, 2007
This study describes how user-generated metadata may be leveraged to enhance findability in web-based social software applications (Morville, 2005). Two interaction design systems, content tagging (Golder & Huberman, 2005) and recommender systems (Resnick & Varian, 1997), are examined to identify strengths and weaknesses along three findability factors : information classification , information retrieval and information discovery .
Greater overall findability strength may be found in content tagging systems than in recommender systems.
I realized that recommendations were being presented in contextually relevant moments of my browsing experience as strategies to cross sell products or simply as a device to enable broader information awareness.
The system was learning from my navigation and browsing behavior to make personalized recommendations.
The presentation layer of the web experience was evolving from a static, outbound communications tool to a more dynamically created user-centric experience.
Despite these dynamic enhancements, the communication channel was still primarily one directional – outbound. The user was the recipient of information and did not overtly participate.
More recently, I discovered a web site – del.icio.us that allows users to apply their own descriptive labels to web sites.
It is a book marking tool – like the one built into every web browser – but allows for user-generated metadata so that bookmarks can live in more than one folder and have multiple terms attached to them to enhance their retrieval.
I noticed two key things happening with content tagging systems – user control for improved labeling and findability and the benefit of surfacing the tags of other users to create access points to related content.
TIME-liness of Topic Since then, the rise of social software designed to leverage user-generated metadata has boomed. The 2006 Time Magazine’s “Person of the Year” was - the user! The rapid increase of user participation and user generated content in social software systems continues to gain popularity under the moniker of “Web 2.0.”
TIME-liness of Topic “ It's a story about community and collaboration on a scale never seen before. It's about the many wresting power from the few and helping one another for nothing and how that will not only change the world, but also change the way the world changes. It's a tool for bringing together the small contributions of millions of people and making them matter. And for seizing the reins of the global media, for founding and framing the new digital democracy, for working for nothing and beating the pros at their own game, TIME's Person of the Year for 2006 is you.”
A web-based information system consists of organization, labeling, navigation, and searching sub-systems that help people find and manage information more successfully (Morville & Rosenfeld, 2002).
Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information (NISO, 2004).
Social computing is the use of social software to create social conventions and contexts in a virtual, web-based community (Millen and Patterson, 2002).
Tools (software) designed to enable peer-to-peer interactions in order to facilitate content creation and collaboration through computer-mediated communication. It enables a “bottom-up” approach where users of the software create the system’s value rather than the value being delivered by an external source (Tepper, 2003).
Metadata that is defined and described by human users based on individual preferences rather than by structured content classification or management requirements such as taxonomies or controlled vocabularies (Golder and Huberman, 2005). Metadata that is created in a “bottom-up” method by users of a social software system. It is unstructured because it is not predefined by a formal content management system using a controlled vocabulary, taxonomy, ontology, or other structured categorization process (Mathes, 2004).
For purposes of this study, findability refers to the likelihood of retrieving desired information in a web-based system. It is the quality of being locatable or navigable. At the item level, we can evaluate to what degree a particular object is easy to discover or locate. At the system level, we can analyze how well a physical or digital environment supports navigation and retrieval. Information that is highly findable is easily located and accessed through the use of a well developed information retrieval system (Morville, 2005).
Web-based information systems that enable users to assign uncontrolled keywords to information resources. Such tags are used to enable the organization of information within a personal information space, but are also shared, thus allowing the browsing and searching of tags attached to information resources by other users. It also allows users to tag their information resources with those tags that exemplify popularity.
Tags are generally single terms, however the assignation of multiple tags to a single resource can be accommodated by omitting essential syntax or punctuation and by using symbols to combine terms (e.g. information+management) (Macgregor and McCulloch, 2006).
Recommender systems capture implicit and explicit user decisions and behavior in order to recommend content that matches previous choices. Most recommender systems rely on collaborative filtering software that leverage algorithmic models to generate predictive recommendations. Recommender systems use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices (Resnick & Varian, 1997).
The process of grouping information into intuitive categories by applying descriptive metadata that may be used as part of an organization schema designed to aid information retrieval and discovery (Mathes, 2004).
The process of retrieving useful information through serendipitous means (Morville and Rosenfeld, 2002) – often due to classification descriptions provided by other users in ambiguous organization systems such as folksonomies (Mathes, 2004).
The process of locating desired information through the use of web based search utilities (Morville, 2004).
In search systems, serendipitous discovery is the process of retrieving information related to an initial search query due to the presentation of related information in the search results. This occurs most often in ambiguous classification schemes where information is likely to exist in multiple categories depending on the context of the search. The grouping patterns of ambiguous organization schemes support associative learning that enables users to identify new and unintended relationships about the information they are seeking (Morville and Rosenfeld, 2002).
Based on an accounting of the strength and weakness elements in each of the six system-factor combinations, the most oft-cited strength of content tagging systems is the ability to classify information.
The most oft-cited strength of recommender systems is the ability to discover information.
The most consistent weakness of content tagging systems is the ability to classify information.
The most oft-cited weakness of recommender systems is the ability to classify information.
One interpretation of this pattern is that content tagging systems enable greater findability potential than recommender systems. However, this statement would need to be evaluated further using more quantitative measures to ensure statistical relevancy.
Further, the overall strength/weakness balance for all content tagging factors is +5 while the overall strength/weakness balance for all recommender system (RS) factors is -7.
Based on this accounting approach, content tagging systems may be assumed to provide greater overall findability strength than recommender systems due to the number of overall findability strengths minus overall weaknesses per the results of the study
As outlined in the outcome slides, there are clear strengths and weaknesses to be considered when designing a content tagging or recommender system that leverages user-generated metadata as a mechanism to enhance findability.
Content tagging provides more user incentive (lower barrier to participation) and overall findability benefits than recommender systems.
Search engines great for precision, content tagging good for both precision and recall, but only within the community of taggers who have evaluated content.
The best content bubbles up – as in google, but in CT systems, there are links to less popular, but potentially significant – related content.