Complex networks in tagging systems <ul><ul><li>Andrea Capocci </li></ul></ul><ul><ul><li>Dipartimento di Informatica e Si...
Tag networks <ul><ul><li>www.citeulike.org </li></ul></ul><ul><ul><li>Users  save  scientific publications  and tag them w...
TAGS
Tagging systems as tripartite networks <ul><ul><li>Tag assignment </li></ul></ul><ul><ul><li>A tagging system is a set of ...
 
 
Text analysis of tagging <ul><ul><li>The stream of tags can be interpreted as a text continuously written by collaborative...
Sub-linear vocabulary growth internal time # of  tags
del.icio.us > x 0.8
Tag frequency distribution
Preferential attachment
Few tags per resource
Where is semantics? <ul><ul><li>Such properties can be modeled by  Yule-Simon processes  with memory (see Cattuto et al.)‏...
Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategie...
Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategie...
Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategie...
Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategie...
Tag co-occurrence network <ul><ul><li>Tags  are nodes. </li></ul></ul><ul><ul><li>If two tags are assigned to the same </l...
Distribution of strength
Distribution of strength ?
Nontrivial clustering  & spam detection Clustering coefficient C(k)  Average density of triangles  around nodes with degre...
Nontrivial clustering  & spam detection
Nontrivial clustering  & spam detection k = 502
Looking for a k = 502 page...
 
SPAM
Nontrivial clustering  & spam detection spam k = 502
Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>Th...
Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>Th...
Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>Th...
Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>Th...
Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>Th...
Users' strategies Do users tag resources according to tag conceptual hierarchy?
Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert
Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert scale-free ...
Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert scale-free ...
Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert scale-free ...
Model based on hierarchy <ul><ul><li>Conjectures </li></ul></ul><ul><ul><li>1. Tags have an underlying hierarchy. </li></u...
Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time st...
Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time st...
Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time st...
Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time st...
Results: strength distribution
Results: clustering
Conclusions <ul><ul><li>Tagging systems display non trivial statistical properties: Zipf laws. </li></ul></ul><ul><ul><li>...
Thank you and thanks to... <ul><ul><li>Guido Caldarelli </li></ul></ul><ul><ul><li>The TAGORA group (Cattuto et al.)‏ </li...
Upcoming SlideShare
Loading in …5
×

Pula 5 Giugno 2007

628 views
575 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
628
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pula 5 Giugno 2007

  1. 1. Complex networks in tagging systems <ul><ul><li>Andrea Capocci </li></ul></ul><ul><ul><li>Dipartimento di Informatica e Sistemistica </li></ul></ul><ul><ul><li>Università di Roma ”Sapienza” </li></ul></ul>
  2. 2. Tag networks <ul><ul><li>www.citeulike.org </li></ul></ul><ul><ul><li>Users save scientific publications and tag them with tags (keywords). </li></ul></ul><ul><ul><li>Other examples: </li></ul></ul><ul><ul><li>Flickr.com (photos)‏ </li></ul></ul><ul><ul><li>del.icio.us (bookmarks)‏ </li></ul></ul><ul><ul><li>Connotea.org, BibSonomy (papers)‏ </li></ul></ul>
  3. 3. TAGS
  4. 4. Tagging systems as tripartite networks <ul><ul><li>Tag assignment </li></ul></ul><ul><ul><li>A tagging system is a set of tag assignments. </li></ul></ul><ul><ul><li>A tag assignment is a triplet </li></ul></ul><ul><ul><li>(user, resource, tag)‏ </li></ul></ul><ul><ul><li>CiteULike </li></ul></ul><ul><ul><li>550k tag assignments </li></ul></ul><ul><ul><li>48k distinct tags </li></ul></ul><ul><ul><li>180k distinct papers </li></ul></ul><ul><ul><li>6k distinct users </li></ul></ul>
  5. 7. Text analysis of tagging <ul><ul><li>The stream of tags can be interpreted as a text continuously written by collaborative users. </li></ul></ul><ul><ul><li>Zipf laws, preferential attachment and Yule processes in tags streams? </li></ul></ul><ul><ul><li>del.icio.us > Cattuto et al. </li></ul></ul>
  6. 8. Sub-linear vocabulary growth internal time # of tags
  7. 9. del.icio.us > x 0.8
  8. 10. Tag frequency distribution
  9. 11. Preferential attachment
  10. 12. Few tags per resource
  11. 13. Where is semantics? <ul><ul><li>Such properties can be modeled by Yule-Simon processes with memory (see Cattuto et al.)‏ </li></ul></ul><ul><ul><li>But such analysis does not capture the semantics of tags : hierarchical relations etc. </li></ul></ul>
  12. 14. Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategies to improve the system, propose new services. </li></ul></ul><ul><ul><li>Spam detection. </li></ul></ul>
  13. 15. Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategies to improve the system, propose new services. </li></ul></ul><ul><ul><li>Spam detection. </li></ul></ul>
  14. 16. Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategies to improve the system, propose new services. </li></ul></ul><ul><ul><li>Spam detection. </li></ul></ul>
  15. 17. Why semantics matters? <ul><ul><li>Detection of tags categories. </li></ul></ul><ul><ul><li>Understanding users' strategies to improve the system, propose new services. </li></ul></ul><ul><ul><li>Spam detection. </li></ul></ul>
  16. 18. Tag co-occurrence network <ul><ul><li>Tags are nodes. </li></ul></ul><ul><ul><li>If two tags are assigned to the same </li></ul></ul><ul><ul><li>resource, one puts an edge between the </li></ul></ul><ul><ul><li>two tags. </li></ul></ul><ul><ul><li>Edges are weighted : each co-assignment </li></ul></ul><ul><ul><li>of two tags increases the edge weight by </li></ul></ul><ul><ul><li>one. </li></ul></ul><ul><ul><li>Strength instead of degree. </li></ul></ul>
  17. 19. Distribution of strength
  18. 20. Distribution of strength ?
  19. 21. Nontrivial clustering & spam detection Clustering coefficient C(k) Average density of triangles around nodes with degree k
  20. 22. Nontrivial clustering & spam detection
  21. 23. Nontrivial clustering & spam detection k = 502
  22. 24. Looking for a k = 502 page...
  23. 26. SPAM
  24. 27. Nontrivial clustering & spam detection spam k = 502
  25. 28. Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>The significance of such statistical property is ambiguous. </li></ul></ul><ul><ul><li>Clustering encodes semantics (?)‏ </li></ul></ul><ul><ul><li>Clustering can be used to detect spam. </li></ul></ul>
  26. 29. Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>The significance of such statistical property is ambiguous. </li></ul></ul><ul><ul><li>Clustering encodes semantics (?)‏ </li></ul></ul><ul><ul><li>Clustering can be used to detect spam. </li></ul></ul>
  27. 30. Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>The significance of such statistical property is ambiguous. </li></ul></ul><ul><ul><li>Clustering encodes semantics (?)‏ </li></ul></ul><ul><ul><li>Clustering can be used to detect spam. </li></ul></ul>
  28. 31. Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>The significance of such statistical property is ambiguous. </li></ul></ul><ul><ul><li>Clustering encodes semantics (?) </li></ul></ul><ul><ul><li>Clustering can be used to detect spam. </li></ul></ul>
  29. 32. Co-occurrence networks and semantics <ul><ul><li>Co-occurrence networks are scale-free ones. </li></ul></ul><ul><ul><li>The significance of such statistical property is ambiguous. </li></ul></ul><ul><ul><li>Clustering encodes semantics (?)‏ </li></ul></ul><ul><ul><li>Clustering can be used to detect spam. </li></ul></ul>
  30. 33. Users' strategies Do users tag resources according to tag conceptual hierarchy?
  31. 34. Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert
  32. 35. Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert scale-free networks
  33. 36. Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert scale-free networks networks HIERARCHICAL
  34. 37. Semantics and hierarchy For example ” Emergence of scaling in random networks” by A.-L. Barabasi and R. Albert scale-free networks WWW NON HIERARCHICAL
  35. 38. Model based on hierarchy <ul><ul><li>Conjectures </li></ul></ul><ul><ul><li>1. Tags have an underlying hierarchy. </li></ul></ul><ul><ul><li>2. With high probability, users add tags hierarchically. </li></ul></ul><ul><ul><li>Can we reproduce the co-occurrence network structure based on tag hierarchy? </li></ul></ul>
  36. 39. Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time step, we add a new resource, with two tags. </li></ul></ul><ul><ul><li>New tags are introduced with probability P nt . </li></ul></ul><ul><ul><li>With probability P sb , the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly. </li></ul></ul>
  37. 40. Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time step, we add a new resource, with two tags. </li></ul></ul><ul><ul><li>New tags are introduced with probability P nt . </li></ul></ul><ul><ul><li>With probability P sb , the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly. </li></ul></ul>
  38. 41. Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time step, we add a new resource, with two tags. </li></ul></ul><ul><ul><li>New tags are introduced with probability P nt . </li></ul></ul><ul><ul><li>With probability P sb , the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly. </li></ul></ul>
  39. 42. Model based on hierarchy <ul><ul><li>The underlying hierarchy is a random tree. </li></ul></ul><ul><ul><li>At each time step, we add a new resource, with two tags. </li></ul></ul><ul><ul><li>New tags are introduced with probability P nt . </li></ul></ul><ul><ul><li>With probability P sb , the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly. </li></ul></ul>
  40. 43. Results: strength distribution
  41. 44. Results: clustering
  42. 45. Conclusions <ul><ul><li>Tagging systems display non trivial statistical properties: Zipf laws. </li></ul></ul><ul><ul><li>Co-occurrence networks are a way of discovering semantic relationship between tags (?) </li></ul></ul><ul><ul><li>Clustering in co-occurrence networks encodes semantics (?) and detects spam. </li></ul></ul><ul><ul><li>Simple models based on hierarchy partially explain such properties. </li></ul></ul>
  43. 46. Thank you and thanks to... <ul><ul><li>Guido Caldarelli </li></ul></ul><ul><ul><li>The TAGORA group (Cattuto et al.)‏ </li></ul></ul>

×