Enhancing the Navigability of Social Tagging Systems with Tag Taxonomies

Enhancing the Navigability of Social Tagging Systems
with Tag Taxonomies

Christoph Trattner & Christian K¨rner & Denis Helic
o

KMI, TU Graz

September 8, 2011

Christoph Trattner & Christian K¨rner & Denis Helic (KMI, Navigability of Social Tagging Systems with Tag Taxonomies
o Enhancing the TU Graz) September 8, 2011 1 / 26

Introduction

“Tagging gained tremendously in popularity over the past few years”


Introduction

Figure: Tags on Flickr

Introduction

Figure: Tags on Amazon


Introduction

Figure: Tags on LastFM


Introduction

What we also like about tags, apart form the fact that they represent
a cheap and light-weight alternative to common key-word based
semantic enrichment, is the fact that they allow us to invent tools to
explore or navigate an information system in a light-weight and
concept driven manner.
A popular example of such a tool are tag taxonomies!


Introduction
Q: What is a tag taxonomy?
A: A tool that allows us to navigate information items in an
information system in a concept driven and hierarchical manner.

Figure: Tag Taxonomy

Introduction

Popular examples of tag taxonomy induction algorithms are:
The graph based approach of Heymann (Heymann et al. 2009)
Aﬃnity Propagation (Lerman et al. 2010)
Hierarchical K-Means (Dhillon et al. 2001)


Why usefulness of tag taxonomies for navigation is limited?

What we also observed in recent research regarding tagging is the fact
that tag based navigation has also it’s limitations (Helic et al. 2010).
The problem with tagging is basically the fact that people do not
apply tags to all resources of an information system system in a
uniform manner.



Actually, it was observed (H. Halpin et al. 2007) that the tag distribution
of almost all tagging systems follows a power-law function, i.e. there are
many tags that refer to a large number of resources.

(a) Austria-Forum (b) BibSonomy (c) CiteULike

Figure: Tag distributions.


Hence, to navigate from one resource to another resource in an
information system with the help of a tag taxonomy the user would have to
click many many times in the worst case to reach a desired target resource.

Figure: Result list of the tag “blog” in the bookmarking system Delicious.


Now, to support the user in the process to also navigate to the
resources of a tagging system in an eﬃcient manner, we invented the
approach of the so-called tag-resource taxonomies.

Car

Car
Tire Motor

Tire Motor
Mercedes VOLVO VW BMW

VW BMW VW BMW

(a) Tag Taxonomy (b) Tag-Resource Taxonomy

Figure: Tag Taxonomy vs. Tag-Resource Taxonomy.

The beauty of such tag-resource hierarchies is that the result lists are
limited to a certain branching factor b and the maximum number of clicks
is bounded by log(n), where n are the number of resources.


Sample calculations of a tag taxonomy vs. a tag-resource taxonomy for
the max number of clicks for three diﬀerent tagging datasets with
branching factor b = 10.

Austria-Forum BibSonomy CiteULike
max{click(Ttag )} 184 5,278 20,799
max{click(Tres )} 6.1 7.7 8.5
Table: Tag Taxonomy vs. Tag-Resource Taxonomy.



Sample calculations of a tag taxonomy vs. a tag-resource taxonomy for
the mean number of clicks for three diﬀerent tagging datasets with
branching factors ranging from b = 2 − 10.
b Austria-Forum BibSonomy CiteULike
mean{click(Tres )} 2 14.2 17.8 19.8
mean{click(Ttag )} 2 29.5 22.4 30.7
mean{click(Tres )} 5 6.1 7.6 8.5
mean{click(Ttag )} 5 11.6 9.2 12.3
mean{click(Tres )} 10 4.3 5.3 5.9
mean{click(Ttag )} 10 6.4 5.6 7.3

Table: Tag Taxonomy vs. Tag-Resource Taxonomy.


Creating tag-resource Taxonomies

“How do we create tag-resource hierarchies?”



Actually, the ﬁrst step to create a tag-resource hierarchy is to create a
resource hierarchy out of a tagging dataset.
1. Computer Degree centrality for each resource of the tagging
dataset and take the most general resource as our root
2. Compute cosine-similarity for all resources that are related to the
root node
3. Re-rank nodes according to their cosine*centrality values
4. Attach max. b resources as childs to the root.
5. Set next child as root and go to step 2.



To generate the actual tag-resource taxonomy we invented a hierarchical
labeling algorithm. Basically the algorithm works as follows:
1. Traverse the resource taxonomy in left-order and calculate a
co-occurance vector for the currently processed resource.
2. Remove all tags from the co-occ. vector that are not in the tag set
of the currently processed resource.
3. Try to apply most general tag of the co-ooc. vector. If the
candidate tag has already been applied to one of the parent resources
of the currently processed resource, take the next candidate tag from
the co-occ. vector.


Evaluating Tag-Resource Taxonomies

In order to evaluate our approach, we conducted basically 3 diﬀerent
experiments
As dataset for our analysis we used a tagging dataset from a large
Wiki based information system called the Austria-Forum.



Since our tag-taxonomy induction algorithm is not to 100% free of
collisions, we conducted a simple experiment were we measured the
number of collisions that occur during the labeling process.
Example of a collision: car > bmw > bmw
For that purpose we generated three different tag-resource
taxonomies with different branching factors ranging from b = 2 − 10
and investigated the collision rate.

Name b n CR (%)
Res2 2 19,430 0.1%
Res5 5 19,430 0.2%
Res10 10 19,430 0.2%

Table: Collision Rates (CR) for different resource taxonomies with different
branching factor b.



In the second experiment we measured the semantic structure of the
tag-resource taxonomy compared to popular tag taxonomy induction
algorithms such as Heymann, K-Means, Affinity Propagation and
Co-Occurance
As measure for this experiment we used Taxonomic Recall/Prec. and
Overlap.
As Ground truth we used the Germanet ontholoy
For the experiment we again generated three different tag-resource
taxonomies with different branching factors b.



0.4
Taxonomic F−Measure
0.35 Taxonomic Overlap

0.3
Count (1 = 100%)

0.25

0.2

0.15

0.1

0.05

0
Res2 Res5 Res10 Deg/Cooc Aff. Prop K−Means Heymann

Figure: Results of the semantic evaluation of the three generated tag-resource
taxonomies Res2, Res5 and Res10.



In the third and last experiment a user study was conducted to
evaluate weather our approach is also useful for humans and could be
used in a practical setting
To compare our approach against a golden standard we used for the
experiment so far best known tag taxonomy induction algorithm
(Deg/Cooc)
To measure the performance of our approach, we invited 9 test users
to judge 200 tag trails extracted from both hierarchies


To ensure that the user would not know which trail she is actually
judging, we mixed the trails up uniform at random
To actually evaluate the trails, we asked our test users to start from
the most left concept and to move on to the most right concept in
the trail
The evaluation schema given to the user was the following:
Classiﬁcation Description
Correct Correct hierarchy relation
Related Correct relation, but not hierarchical
or reverse hierarchical
Equivalent Synonym
Not Related The relations do not have anything
to do with each other
Unknown The evaluator does not recognize
the meaning of the tag(s)

Table: Classiﬁcation Labels for the User Evaluation.



The user study showed a high performance of our approach compared to a
Deg/Cooc tag taxonomy.
Name b Correct (%) Related (%) Equivalent (%) Not Related (%) Unknown(%)
Deg/Cooc10 10 33.2 27.3 13 21.9 5.1
Res10 10 27.3 36.2 12.3 19.8 4.2

Table: Results of the empirical analysis of the tag-resource taxonomy with
branching factor b = 10 compared to a Deg/Cooc tag taxonomy with branching
factor b = 10.


Summary

We showed that tag taxonomies are in general not very well suited for
finding resources in an efficient number of clicks.
To tackle that issue we introduced a novel approach of the so-called
tag-resource hierarchies.
We illustrated in theory that with the approach of a tag-resource
taxonomy it is possible to navigate to resources efficiently.
Additionally to these findings, we introduced an algorithm to generate
such hierarchies and presented in a number of experiments that
proofed that tag-resource taxonomies perform on a semantic level
nearly as good or even better than popular tag taxonomy approaches.


End of presentation

Thank you very much for your attention!
Christoph Trattner (ctrattner@iicm.edu)


Enhancing the Navigability of Social Tagging Systems with Tag Taxonomies

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Enhancing the Navigability of Social Tagging Systems with Tag Taxonomies

Similar to Enhancing the Navigability of Social Tagging Systems with Tag Taxonomies (20)

Recently uploaded

Recently uploaded (20)

Enhancing the Navigability of Social Tagging Systems with Tag Taxonomies