Tag based Information Retrieval using foksonomy


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tag based Information Retrieval using foksonomy

  1. 1. Tag Based Information Retrieval using Folksonomy Term Paper-2 Submitted by NIKESH.N International School of Information Management University of Mysore 2010
  2. 2. Tag Based Information Retrieval using Folksonomy 1.0 Introduction Web 2.0 represents the collaborative web which revolutionized the access and creation of information over the internet. Web 2.0 facilitates the user to interact the web in a better way and provides new ways of access to web information. Blogs, Wikis, micro blogging, multimedia sharing services, content tagging services etc. are some of the main constituents that add richness to web 2.0. Especially, tagging as a phenomenon corresponds with a Web 2.0 mentality that users can create not only content but a richer, more adaptive and responsive way to navigate and search both existing and new media. The tagging promises better and more intuitive information access through tag-based browsing, information retrieval [1]. A user's context affects how they interact with an information retrieval system, what type of response they expect from a system and how they make decisions about the information objects they retrieve [2]. The primary objective of an Information Retrieval system is to retrieve relevant content to the user. So we can say Information Retrieval is context dependent and subjective to user and situation. In principle, an information retrieval system should be context aware [3]. This demands an IR system which incorporates the relevance of context in information retrieval process. This paper analyse the importance of social tagging (folksonomy) in Information retrieval and analyse some algorithms and methods suggested by some scholars for retrieving foksonomy based information retrieval. 2. 0 Relevance of Tags as a metadata source Many researchers state that the common formal way of professionally assigning metadata is no longer the optimal way of annotating content, both in terms of efficiency and in terms of user support. Macgregor & McCulloch [4] argue that if applied to digital libraries and the web, traditional metadata creation and indexing suffer from scalability problems and the need for a substantial amount of resources. In an opinionated web article [5], Shirky argues that “Users have a terrifically hard time guessing how something they want will have been categorized in advance, unless they have been educated about those categories in advance as well, and the bigger the user base, the more work that user education is.” In a nutshell, The user tagging concept is based on two simple ideas. The first one is that the keywords which people use to tag audio files, video files, web pages, photos, blog posts, etc are similar to the keywords that they use to search and retrieve information. As the whole process of tagging involves a human element to it, the search results will make more sense than automated search results which merely read the metadata embedded in the web pages or content. The
  3. 3. second basic foundation is the User Generated Content. User generated content has seen a huge growth and popularity in the past couple of years. The Web 2.0 platform has taken User Generated Content to a new level. All blogging platforms like Blogger, Wordpress, TypePad; Video Sites like YouTube, Metacafe, etc allow users to tag content. The tags from this huge User Generated Content give updated information which many users. Social filtering Social filtering is a community-based approach which is a promising complement to the existing individual-based filtering approach. In collaborative tagging, users are motivated to contribute tags to change the appearance of the tag cloud. On a website in which many users tag many items, this collection of tags becomes a Folksonomy. Tagging is not only an individual process of categorization, but implicitly it is also a social process of indexing, a social process of knowledge construction. Users share their resources with their tags, generating an aggregated tag-index so-called Folksonomy. The term Folksonomy is coined by Thomas Vander Wal in AIfIA mailing list - a one-word neologism that comes from the words “taxonomy” and “folk” (Quintarelli, 2005). Folksonomy allows anyone to access to any web resource that was previously tagged, based on two main paradigms of information access: Information Filtering (IF) and Information Retrieval (IR). In Information Filtering, user plays a passive role, expecting that system pushes or sends toward him information of interest according to some previously defined profile. Social bookmarking tools allow a simple IF access model, where user can subscribe to a set of specific tags via RSS/Atom syndication, and thus be alerted when a new resource will be indexed with this set. On the other hand, in Information Retrieval, user seeks actively information, pulling at it, by means of querying or browsing. In tag querying, user enters one or more tags in the search box to obtain an ordered list of resources which were in relation with these tags. When a user is scanning this list, the system also provide a list of related tags (i.e. tags with a high degree of co-occurrence with the original tag), allowing hypertext browsing. As of now, the crawler pulls all the tags from all popular websites which have Application Programming Interfaces (API) available and using linear interpolation forms an index which can be sorted and ranked. Based on the search query of the user, the highest ranked tags will appear in the tag cloud, which allow the user to have a visualized approach of the data. 3.0 Folksonomy based IR One of the first scientific publications about folksonomy is done by Adam Mathes (7) in 2004 where several concept of bottom-up social annotation are introduced. Andreas Hotho et. al (7) in their work described that using traditional information retrieval, folksonomy contents can be searched textually. However, as the documents consist of short text snippets only
  4. 4. (e.g.,the web page title, and the tags themselves), ordinary ranking schemes such as TF/IDF are not feasible. They propose FolkRank which is an adapted PageRank algorithm. In order to employ a weight-spreading ranking scheme on folksonomies, FolkRank transforms the hyper-graph into an undirected graph. Then it applies a differential ranking approach that deals with the skewed structure of the network and the un directedness of folksonomies. 3.1 Results for Adapted PageRank Hotho and his team have evaluated the Adapted PageRank on the del.ico.us dataset . First, they studied the speed of convergence. If let ~p := 1 (the vector having 1 in all components), and varied the parameter settings. In all settings, they discovered that ® 6= 0 slows down the convergence rate. For instance, for ® = 0:35; ¯ = 0:65; ° = 0, 411 iterations were needed, while ® = 0; ¯ = 1; ° = 0 returned the same result in only 320 iterations. It turns out that using ° as a damping factor by spreading equal weight I. e., each row of the matrix is normalized to 1 in the 1-norm, and if there are no rank sinks – but this holds trivially in graph GF. 4.0 GroupMe Folksonomy Fabian Abel et. al in their paper,”Analyzing Ranking Algorithms in Folksonomy Systems” introduces a concept named as ‘GroupMe Folksonomy’. It is a resource sharing system like in del.icio.us or Bibsonomy which have the extended feature of grouping Web resources. These ‘GroupMe’ groups can contain arbitrary multimedia resources like websites, photos or videos, which are visualized according to their media type: E.g., images are displayed as thumbnails and the headlines from RSS feeds are structured in a way that the most recent information are accessible by just one click. With this convenient visualization strategy, the user can grasp the content immediately without the need of visiting the original Web resource. GroupMe motivates users to tag resources by using the free-for-all tagging approach which enables users to tag not only their own resources, but all resources within the GroupMe! system. In a study Fabian et.al have conducted an experiment in which , on a logarithimic scale (extended with zero), they plotted the number of tag assignments on the y- axis and the number of resources having this number of tags assigned on the x- axis. They observed a power law distribution of the tag assignments per resource, while about 50% of all resources do not even have a single tag assignment. And they infer that 50% of all resources in the GroupMe! System can hardly be found by known folksonomy based search and ranking algorithms. They have proposed 3 new Folksonomy based algorithms.
  5. 5. GFolkRank - Graph-based ranking algorithms, which extend FolkRank [7] and turn it into a group-sensitive algorithm in order to exploit GroupMe! folksonomies Personalized SocialPageRank- Extension to SocialPage-Rank, which allows for topic- sensitive rankings. GRank- A search and ranking algorithm optimized for Group-Me! folksonomies. 4.1 G Folk Rank GFolkRank interprets groups as artificial, unique tags. If a user u adds a resource r to a group g then GFolkRank interprets this as a tag assignment (u; tg; r; "), where tg 2 TG is the artificial tag that identifies the group. The Folksonomy graph GF is extended with additional vertices and edges. The set of vertices is expanded with the set of artificial tags TG: VG = VF [ TG. Furthermore, the set of edges EF is augmented by EG = EF [ ffu; tgg; ftg; rg; fu; gju 2 U; tg 2 TG; r 2 _R; u has added r to group gg. The new edges are weighted with a constant value wc as a resource is usually added only once to a certain group. They selected wc = 5:0 _ max(jw(t; r)j as they believed that grouping a resource is, in general, more valuable than tagging it. GFolkRank is consequently the FolkRank algorithm, which operates on basis of GG = (VG;EG). 4.2 Personalized socialpageRank SocialpageRank 10 introduced by Bio et.al is based on the observation that there is a strong interdependency between the popularity of users, tags, and resources within a folksonomy. For example, resources become popular when they are annotated by many users with popular tags, while tags, on the other hand, become popular when many users attach them to popular resources. Personalized SocialPageRank algorithm is an extend of SocialPageRank which transform into a topic –sensitive ranking algorithm. It emphasizes weights within the input matrices of socialpageRank , so that preferences can be considered to a certain context. 4.3 GroupMe! Ranking Algorithm (GRank) GRank, a search and ranking algorithm optimized for GroupMe! folksonomies.. The GRank algorithm computes a ranking for all resources, which are related to a tag tq with respect to the group structure of GroupMe! Folksonomies.
  6. 6. 4.4 Comparison of Algorithms Study conducted by Fabian Abel on del.icio.us dataset shows that GFolkRank did better than FolkRank and SocialPageRank as far as the overlapping similarity goes. They then tested them on untagged resources (because a lot of the time not everything is tagged up) and FolkRank has shown a better result. 5.0 Web page Recommender system using folksonomy Satoshi Niwa11 et.al in their paper Web Page Recommender System based on Folksonomy Mining, described various algorithm to find out various aspects of folksonomy like affinity level between users and tags, similarity between tags, Cluster tags, affinity level between users and tag, Calculate recommendation pages to each user. 6.0 Conclusion Present trend shows increased presence of real time web in all areas. The amounts of data generated by folksonomies are being increased exponentially. So it is a great challenge for search engines to index and rank these data. The algorithms described above are giving only partial success in their experimental stage. As social bookmarking and folksonomies have an inevitable relation personalized human behavior and cognition, there need to conduct more researches in this area in collaboration with Natural Language processing, 6.0 References 1. Robert Graham, Brian Eoff, and James Caverlee, "Plurality:A Context-Aware Personalized Tagging System", WWW 2008, April 21-25, 2008, Beijing, China 2. Fabio Crestani and Ian Ruthven, "Introduction to special issue on contextual information retrieval systems",Information Retrieval ,Vol.10, No.2, pp.111-113 3. Massimo Melucci, “A basis for information retrieval in context”, ACM Transactions on Information Systems, Vo.26, No.3, ACM Press, June 2008 4. 7-G. Macgregor and E. McCulloch: Collaborative tagging as a knowledge organization and resource discovery tool In Library Review 55 (5), pp. 91-300. 5. 14-C. Shirky: Ontologies are overrated: categories, links, and tags Clay Shirky’s writings about the Internet Retrieved from: http://www.shirky.com/writings/ontology_overrated.html 6. 17-J. Trant: Exploring the potential for social tagging and folksonomy in art museums: Proof of concept In New Review in Hypermedia and Multimedia, Volume 12 (1), June 2006 , pp. 83-105. 7. Adam Mathes. Folksonomies – Cooperative Classification
  7. 7. and Communication Through Shared Metadata, December 2004. http://www.adammathes.com/academic/computer-mediatedcommunication/ folksonomies.html. 8. Andreas Hotho, Robert Jäschke, Christoph Schmitz, and Gerd Stumme, “Information Retrieval in Folksonomies: Search and Ranking”, Proceedings of the 3rd European Semantic Web Conference. Budva, Montenegro, pp.411- 426, 2006. 9. Andreas Hotho,1 Robert J¨aschke,1;2 Christoph Schmitz,1 Gerd Stumme1;2, Information Retrieval in Folksonomies: Search and Ranking 10. S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su.Optimizing Web Search using Social Annotations. In Proc. of 16th Int. World Wide Web Conference (WWW '07), pages 501{510. ACM Press, 2007.