2. Abstract
We present a tag recommendation model for collaborative
bookmarking systems.
Suggesting most relevant tags for a given URL and its
description.
We are using Lucene index and clutering based appraoch to
determine the same.
3. Problem Statement
Design a tag recommendation system which will form a tag
cloud from a given corpus.
The tag recommendation problem can be described as follows:
For a given post P whose user is U and resource is R, a set of tags
are suggested as tags for the post.
The commonly used approach to choose the tags is rule-based
and classiffication-based methods, but both of them have
defects: rule-based approach relies on expert experience and
manual efforts to set up the rules and tuning the parameters;
classiffication-based is restrict to the fix of tag space and is
inefficient when it is treated as a multi-label problem.
4. Related Work
Some of the previous work in tag recommendation area has been done in
content-based and collaborative approach.
In the content-based approach, a system exploits some textual source with
Information Retrieval-related techniques in order to extract relevant N-
grams from the text.
5. Approach
We started with some pre-processing of given training
dataset and finding the RSS feeds
First we crawl the URLs from given training dataset to
extract the web content like text, pdf, html document etc.
Than we use Lucene to Index the crawled data.
We are using similarity score based approach and clutering
based approach and weighted criteria to identify most
relevant tags for given query for the same we are creating
one more index other than previous one.
6. Approach… continued
For the Extraction of candidate tags we are using
following sources::
URL given by the user
From the user's previously tagged resources
From the given description
Word related tags which are extracted from description
For Ranking we are using user history and applying a
clustering and weighted approach approach
8. Theory
As a part of our clustering model we are calculating clusters on
following different events:
Grouping the tag on their popularity for link
Weighing the tag on their popularity in user's tag
Giving more weight to title tag in over all data
How much tag is related to words in given description