Folksonomies Indexing Und Retrieval In Bibliotheken

Folksonomies
Inhaltserschließung und Retrieval
im Web 2.0
und
in Bibliotheken

Dr. phil. Isabella Peters
Heinrich-Heine-Universität Düsseldorf
Abteilung für Informationswissenschaft

1
Uni Graz – 17. Dezember 2009

Folksonomies: Indexing without Rules

“Anything goes”
“Against method”, 1975 (Paul K. Feyerabend, Austro-American
philosopher)

Tagging
• no rules
• no methods – or even against methods
• indexing a single document
– synonyms – why not? (New York – NY – Big Apple – … )
– homonyms – never heard! (not: Java [Programming Language] – Java
[Island], but Java)
– translations – why not? (Singapore – Singapur – …)
– typing errors – nobody is perfect (Syngapur)
– hierarchical relations (hyponymy) – why not? (Düsseldorf –
North Rhine-Westfalia – Germany)
– hierarchical relations (meronymy) – why not? (tree – branch – leaf)
2

Indexing – in general

3

Tri-partite System of Folksonomies
Folksonomies consist always of 3 parts
1) document (resource)
2) prosumer (user)
3) tag

4

Users – Tags - Documents

shared users thematically linked

thematically linked shared documents
5

Shared Documents & Thematically
Linked Users

more like this ... thematically linked
similar documents

detection of documents

more like me ...
similar users

detection of communities

shared documents
6

More like me! Or: More like This User!
• starting point: single user (ego)
• processing
– (1) tag-specific similarity
• all tags of ego: a(t)
• all tags of another user B: b(t)
• common tags of ego and another user B: g(t)
– (2) document-specific similarity
• all tagged documents of ego: a(d)
• all tagged documents of another user B: b(d)
• common tagged documents of ego and another user B: g(d)
– calculation of similarity
• tag-specific: Jaccard-Sneath: Sim(tag; Ego,B) = g(t) / [a(t) + b(t) – g(t)]
• document-specific: Jaccard-Sneath: Sim(doc; Ego,B) = g(d) / [a(d) + b(d) – g(d)]
• ranking of Bi by similarity to ego (say, top 10 tag-specific and top 10 document-
specific users)
• merging of both lists (exclusion of duplicates)
• cluster analysis (k-nearest neighbours, single linkage, complete linkage, group
average linkage)
– result presentation: social network of ego in the centre

7

More like me! Or: More like This User!

Sim(tag) = 0.45
Sim(doc) = 0.36

Sim(tag) = 0.21
Sim(doc) = 0.25
Sim(tag) = 0.33 Sim(tag) = 0.15
Sim(doc) = 0.29 Sim(doc) = 0.17
Sim(tag) = 0.08
Sim(doc) = 0.11

Sim(tag) = 0.17
Sim(doc) = 0.23

single linkage clustering Sim(tag) = 0.65
(fictitious example) Sim(doc) = 0.55 8

Narrow Folksonomies

• only one
tagger (the
content creator)
• no multiple
tagging

• example:
YouTube

Tags

9

Extended Narrow Folksonomies
• more than one tagger
• no multiple tagging
• example: Flickr Tags

Source: Vander Wal (2005) Add Tags Option
10

Broad Folksonomies

• more than one tagger
• multiple tagging
• example: Delicious

Tags
Source: Vander Wal (2005)

11

Folksonomies make use of
Collective Intelligence
Collective Intelligence
• “Wisdom of the Crowds” (Surowiecki)
• “Hive Minds” (Kroski) – “Vox populi” (Galton) – “Crowdsourcing”
• no discussions, diversity of opinions, decentralisation
• users tag a document independently from each other
• statistical aggregation of data

Collaborative Intelligence
• discussions and consensus
• prototype service: Wikipedia (but: 90 + 9 + 1 – rule)

“Madness of the Crowds”
• e.g., soccer fans – hooligans
• no diversity of opinion – no independence – no decentralisation –
no (statistical) aggregation
12

Power Tags

• Power Law Distribution • Inverse-logistic Distribution

Power Tags Power Tags

13

Power Law Tag Distribution

Tags zu w w
w .visitlondon.com
Users
70

60 Power Tags f (x)= C / xa
50

40 80/20-Rule

30

20
Long Tail

10

0

t
n

en
m
nd

re

s
on

ay
io

ra
de
el

K

re
m
is
av

tu

at

nd
la

id
U
nd

nd
ui

in
ur

rm
ng

ul
Tr

ol
G

Lo
Lo

ta
To

C

Lo
H
fo
E

er
In

nt

Tags
E

14
Source: http:// del.icio.us

As

0
5
10
15
20
25
30
35
so
cia
tio
ns
Lib

Users
ra
In ry
In fo
fo r
rm mat
at ion
io
ns
cie
nc
e

Long Trunk
Te IA
ch

Source: http:// del.icio.us
no
Pr lo
of gy
es
sio
n
Re al
se
ar
ch
Us
ab
ilit
y
Sc
ien
ce
Power Tags

Lib
ra
In rie
fo s
rm
at
ion We
ar b
ch
ite
ctu
re
Or
ga IT
niz
Tags zu www.asis.org

at
Inverse-logistic Tag Distribution

Ar io
ns
ch
ite
ct
Or u
ga re
nz
at
ion
Long Tail

Co
mp
f (x)=

In ut
fo e
rm Con rs
at fe
e

io
n_ renc
In ar e
fo ch
rm ite
at ct
ur
ion
_s e
cie
nc
e
So
cie
-C‘(x-1)b

ty
15
Tags

Use of Power Tags

• Power Tags as factor in relevance ranking
documents tagged with Power Tags appear higher in
ranking

• Power Tags as candidate tags for Tag Gardening
which (semantic) relation do they have with co
-
occuring tags?

16

Benefits of Indexing with Folksonomies
• authentic user language – solution of the “vocabulary problem”
• actuality
• multiple interpretations – many perspectives – bridging the semantic gap
• raise access to information resources
• follow “desire lines” of users
• cheap indexing method – shared indexing
• the more taggers, the more the system becomes better – network effects
• capable of indexing mass information on the Web
• resources for development of knowledge organization systems
• mass quality “control”
• searching - browsing – serendipity
• neologisms
• identify communities and “small worlds”
• collaborative recommender system
• make people sensitive to information indexing

17

Disadvantages of Indexing with
Folksonomies
• absence of controlled vocabulary
• different basic levels (in the sense of Eleanor Rosch)
• different interests – loss of context information
• language merging
• hidden paradigmatic relations
• merging of formal (bibliographical) and aboutness tags
• no specific fields
• tags make evaluations (“stupid”)
• spam-tags
• syncategoremata (user-specific tags, “me”)
• performative tags (“to do”, “to read”)
• other misleading keywords

solution: Tag Gardening with methods of Information Linguistics, user
collaboration in giving meaning to tags and combination with existing
knowledge organization systems 18

Goal of Tag Gardening: Emergent
Semantics

Quelle: Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and 19
Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/v5n3/a58.html.

Maintenance of KOS and Folksonomy

new terms – new relations

Folksonomy KOS

Tag Gardening
Quelle: Christiaens, S. (2006). Metadata Mechanism: From Ontology to Folksonomy…and Back. Lecture 20
Notes in Computer Science, 4277, 199–207.

Feedback Loop in Practice:
Tagging of OPACs

2 possibilities:

• 1) tagging of resources within the library’s website

• 2) tagging of resources outside the library’s firewall

21

Tagging of OPACS: Within Library’s
Website: PennTags

22
http://tags.library.upenn.edu/

Website: Ann Arbor District Library

23
http://www.aadl.org/catalog

Website: University Library Hildesheim

24
http://www.uni-hildesheim.de/mybib/all_tags

Website

• advantages:

– user behaviour can be directly observed and
exploited for own applications

– used knowledge organization system (KOS) can
profit from user behaviour and user language

– users will be “attracted” to the library

– library will appear “trendy”
25

Website

• disadvantages:

– development and implementation (costs and
manpower) of the tagging service have to be taken
over from the library

– if only users may tag: librarians may loose their
work motivation or may have a feeling of
uselessness

– “lock in” effect of users
- - no “fresh” ideas
26

Tagging of Resources Outside the
Library‘s Firewall: LibraryThing

http://www.librarything.com/search 27

Library‘s Firewall: BibSonomy

http://www.bibsonomy.org/ 28

Library‘s Firewall
• advantages:

– development and implementation (costs and
manpower) of the tagging service haven‘t to be
taken over from the library

– the library may profit from the “know- how” of the
provider of the tagging system

– users may profit from tagging activities of
hundreds of other users no lock in
-

– library appears “trendy” 29

Library‘s Firewall

• disadvantages

– user behaviour cannot be observed or exploited

– your users support other tagging service

– used KOS cannot profit from user behaviour

30

Exkurs: Sentiment Tags
• negative tags: “awful” – “foolish”, …
• positive tags: “amazing” – “useful”, …
• applicable for sentiment analysis of documents

Quelle: Yanbe, Y., Jatowt, A., Nakamura, S., & Tanaka, K. (2007). Can Social Bookmarking Enhance Search in the
31
Web? In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, Vancouver, Canada (pp. 107–116).

Summary

• knowing how folksonomies work is important for their
adequate application in both

– knowledge representation and

– information retrieval

• knowing why folksonomies work is a secret ☺

32

Knowledge Representation and
Information Retrieval
• two sides of the same coin
• Immanuel Kant: Thoughts without content are
empty, intuitions without concepts are blind...

Feedback
Loop

Knowledge Representation Information Retrieval
without Information Retrieval is without Knowledge
empty. Representation is blind. 33

Folksonomies and
Knowledge Organization Systems
• two sides of the same coin
• no rivals- work best in combination!

Feedback
Loop

flexible, up-to-date, user-centric precise, rigid, complete
34

Viele Grüße aus Düsseldorf.

Erschienen 2009 im
Verlag Saur, de Gruyter

Kontakt: isabella.peters@uni duesseldorf.de
-
35

Folksonomies Indexing Und Retrieval In Bibliotheken

Recommended

Recommended

More Related Content

Similar to Folksonomies Indexing Und Retrieval In Bibliotheken

Similar to Folksonomies Indexing Und Retrieval In Bibliotheken (7)

More from Isabella Peters

More from Isabella Peters (6)

Recently uploaded

Recently uploaded (20)

Folksonomies Indexing Und Retrieval In Bibliotheken