Folksonomies Indexing Und Retrieval In Bibliotheken
1. Folksonomies
Inhaltserschließung und Retrieval
im Web 2.0
und
in Bibliotheken
Dr. phil. Isabella Peters
Heinrich-Heine-Universität Düsseldorf
Abteilung für Informationswissenschaft
1
Uni Graz – 17. Dezember 2009
2. Folksonomies: Indexing without Rules
“Anything goes”
“Against method”, 1975 (Paul K. Feyerabend, Austro-American
philosopher)
Tagging
• no rules
• no methods – or even against methods
• indexing a single document
– synonyms – why not? (New York – NY – Big Apple – … )
– homonyms – never heard! (not: Java [Programming Language] – Java
[Island], but Java)
– translations – why not? (Singapore – Singapur – …)
– typing errors – nobody is perfect (Syngapur)
– hierarchical relations (hyponymy) – why not? (Düsseldorf –
North Rhine-Westfalia – Germany)
– hierarchical relations (meronymy) – why not? (tree – branch – leaf)
2
6. Shared Documents & Thematically
Linked Users
more like this ... thematically linked
similar documents
detection of documents
more like me ...
similar users
detection of communities
shared documents
6
7. More like me! Or: More like This User!
• starting point: single user (ego)
• processing
– (1) tag-specific similarity
• all tags of ego: a(t)
• all tags of another user B: b(t)
• common tags of ego and another user B: g(t)
– (2) document-specific similarity
• all tagged documents of ego: a(d)
• all tagged documents of another user B: b(d)
• common tagged documents of ego and another user B: g(d)
– calculation of similarity
• tag-specific: Jaccard-Sneath: Sim(tag; Ego,B) = g(t) / [a(t) + b(t) – g(t)]
• document-specific: Jaccard-Sneath: Sim(doc; Ego,B) = g(d) / [a(d) + b(d) – g(d)]
• ranking of Bi by similarity to ego (say, top 10 tag-specific and top 10 document-
specific users)
• merging of both lists (exclusion of duplicates)
• cluster analysis (k-nearest neighbours, single linkage, complete linkage, group
average linkage)
– result presentation: social network of ego in the centre
7
8. More like me! Or: More like This User!
Sim(tag) = 0.45
Sim(doc) = 0.36
Sim(tag) = 0.21
Sim(doc) = 0.25
Sim(tag) = 0.33 Sim(tag) = 0.15
Sim(doc) = 0.29 Sim(doc) = 0.17
Sim(tag) = 0.08
Sim(doc) = 0.11
Sim(tag) = 0.17
Sim(doc) = 0.23
single linkage clustering Sim(tag) = 0.65
(fictitious example) Sim(doc) = 0.55 8
9. Narrow Folksonomies
• only one
tagger (the
content creator)
• no multiple
tagging
• example:
YouTube
Tags
9
10. Extended Narrow Folksonomies
• more than one tagger
• no multiple tagging
• example: Flickr Tags
Source: Vander Wal (2005) Add Tags Option
10
11. Broad Folksonomies
• more than one tagger
• multiple tagging
• example: Delicious
Tags
Source: Vander Wal (2005)
11
12. Folksonomies make use of
Collective Intelligence
Collective Intelligence
• “Wisdom of the Crowds” (Surowiecki)
• “Hive Minds” (Kroski) – “Vox populi” (Galton) – “Crowdsourcing”
• no discussions, diversity of opinions, decentralisation
• users tag a document independently from each other
• statistical aggregation of data
Collaborative Intelligence
• discussions and consensus
• prototype service: Wikipedia (but: 90 + 9 + 1 – rule)
“Madness of the Crowds”
• e.g., soccer fans – hooligans
• no diversity of opinion – no independence – no decentralisation –
no (statistical) aggregation
12
13. Power Tags
• Power Law Distribution • Inverse-logistic Distribution
Power Tags Power Tags
13
14. Power Law Tag Distribution
Tags zu w w
w .visitlondon.com
Users
70
60 Power Tags f (x)= C / xa
50
40 80/20-Rule
30
20
Long Tail
10
0
t
n
en
m
nd
re
s
on
ay
io
ra
de
el
K
re
m
is
av
tu
at
nd
la
id
U
nd
nd
ui
in
ur
rm
ng
ul
Tr
ol
G
Lo
Lo
ta
To
C
Lo
H
fo
E
er
In
nt
Tags
E
14
Source: http:// del.icio.us
15. As
0
5
10
15
20
25
30
35
so
cia
tio
ns
Lib
Users
ra
In ry
In fo
fo r
rm mat
at ion
io
ns
cie
nc
e
Long Trunk
Te IA
ch
Source: http:// del.icio.us
no
Pr lo
of gy
es
sio
n
Re al
se
ar
ch
Us
ab
ilit
y
Sc
ien
ce
Power Tags
Lib
ra
In rie
fo s
rm
at
ion We
ar b
ch
ite
ctu
re
Or
ga IT
niz
Tags zu www.asis.org
at
Inverse-logistic Tag Distribution
Ar io
ns
ch
ite
ct
Or u
ga re
nz
at
ion
Long Tail
Co
mp
f (x)=
In ut
fo e
rm Con rs
at fe
e
io
n_ renc
In ar e
fo ch
rm ite
at ct
ur
ion
_s e
cie
nc
e
So
cie
-C‘(x-1)b
ty
15
Tags
16. Use of Power Tags
• Power Tags as factor in relevance ranking
documents tagged with Power Tags appear higher in
ranking
• Power Tags as candidate tags for Tag Gardening
which (semantic) relation do they have with co
-
occuring tags?
16
17. Benefits of Indexing with Folksonomies
• authentic user language – solution of the “vocabulary problem”
• actuality
• multiple interpretations – many perspectives – bridging the semantic gap
• raise access to information resources
• follow “desire lines” of users
• cheap indexing method – shared indexing
• the more taggers, the more the system becomes better – network effects
• capable of indexing mass information on the Web
• resources for development of knowledge organization systems
• mass quality “control”
• searching - browsing – serendipity
• neologisms
• identify communities and “small worlds”
• collaborative recommender system
• make people sensitive to information indexing
17
18. Disadvantages of Indexing with
Folksonomies
• absence of controlled vocabulary
• different basic levels (in the sense of Eleanor Rosch)
• different interests – loss of context information
• language merging
• hidden paradigmatic relations
• merging of formal (bibliographical) and aboutness tags
• no specific fields
• tags make evaluations (“stupid”)
• spam-tags
• syncategoremata (user-specific tags, “me”)
• performative tags (“to do”, “to read”)
• other misleading keywords
solution: Tag Gardening with methods of Information Linguistics, user
collaboration in giving meaning to tags and combination with existing
knowledge organization systems 18
19. Goal of Tag Gardening: Emergent
Semantics
Quelle: Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and 19
Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/v5n3/a58.html.
20. Maintenance of KOS and Folksonomy
new terms – new relations
Folksonomy KOS
Tag Gardening
Quelle: Christiaens, S. (2006). Metadata Mechanism: From Ontology to Folksonomy…and Back. Lecture 20
Notes in Computer Science, 4277, 199–207.
21. Feedback Loop in Practice:
Tagging of OPACs
2 possibilities:
• 1) tagging of resources within the library’s website
• 2) tagging of resources outside the library’s firewall
21
22. Tagging of OPACS: Within Library’s
Website: PennTags
22
http://tags.library.upenn.edu/
23. Tagging of OPACS: Within Library’s
Website: Ann Arbor District Library
23
http://www.aadl.org/catalog
24. Tagging of OPACS: Within Library’s
Website: University Library Hildesheim
24
http://www.uni-hildesheim.de/mybib/all_tags
25. Tagging of OPACS: Within Library’s
Website
• advantages:
– user behaviour can be directly observed and
exploited for own applications
– used knowledge organization system (KOS) can
profit from user behaviour and user language
– users will be “attracted” to the library
– library will appear “trendy”
25
26. Tagging of OPACS: Within Library’s
Website
• disadvantages:
– development and implementation (costs and
manpower) of the tagging service have to be taken
over from the library
– if only users may tag: librarians may loose their
work motivation or may have a feeling of
uselessness
– “lock in” effect of users
- - no “fresh” ideas
26
27. Tagging of Resources Outside the
Library‘s Firewall: LibraryThing
http://www.librarything.com/search 27
28. Tagging of Resources Outside the
Library‘s Firewall: BibSonomy
http://www.bibsonomy.org/ 28
29. Tagging of Resources Outside the
Library‘s Firewall
• advantages:
– development and implementation (costs and
manpower) of the tagging service haven‘t to be
taken over from the library
– the library may profit from the “know- how” of the
provider of the tagging system
– users may profit from tagging activities of
hundreds of other users no lock in
-
– library appears “trendy” 29
30. Tagging of Resources Outside the
Library‘s Firewall
• disadvantages
– user behaviour cannot be observed or exploited
– your users support other tagging service
– used KOS cannot profit from user behaviour
30
31. Exkurs: Sentiment Tags
• negative tags: “awful” – “foolish”, …
• positive tags: “amazing” – “useful”, …
• applicable for sentiment analysis of documents
Quelle: Yanbe, Y., Jatowt, A., Nakamura, S., & Tanaka, K. (2007). Can Social Bookmarking Enhance Search in the
31
Web? In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, Vancouver, Canada (pp. 107–116).
32. Summary
• knowing how folksonomies work is important for their
adequate application in both
– knowledge representation and
– information retrieval
• knowing why folksonomies work is a secret ☺
32
33. Knowledge Representation and
Information Retrieval
• two sides of the same coin
• Immanuel Kant: Thoughts without content are
empty, intuitions without concepts are blind...
Feedback
Loop
Knowledge Representation Information Retrieval
without Information Retrieval is without Knowledge
empty. Representation is blind. 33
34. Folksonomies and
Knowledge Organization Systems
• two sides of the same coin
• no rivals- work best in combination!
Feedback
Loop
flexible, up-to-date, user-centric precise, rigid, complete
34
35. Viele Grüße aus Düsseldorf.
Erschienen 2009 im
Verlag Saur, de Gruyter
Kontakt: isabella.peters@uni duesseldorf.de
-
35