test

566 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
566
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

test

  1. 1. Toward Personalized Peer-to-Peer Top-k Processing Xiao BAI Marin BERTIER Rachid GUERRAOUI Anne-Marie KERMARREC 31 March, 2009
  2. 2. Outline  Background and Motivation  Personalized Peer­to­Peer Top­k Processing  Evaluation  Conclusion 2
  3. 3. Collaborative tagging and Top-k  Collaborative tagging systems  U(sers) * I(tems) * T(ags)  Tagged(u, i, t): User u annotates item i with tag t  Top­k Processing Item i Scoretj(i)  Inverted list per tag    Query q = {t1, ..., tn}  Score(i) = f (Scoret1(i), ..., Scoretn(i))  k items with highest scores as results 3
  4. 4. Motivations  Why collaborative tagging?  Flexibility of folksonomy  Searching photos, videos, ...  Why personalization?  Variety of user preferences 4
  5. 5. How to personalize?  Network­aware search  Interest­based personal network  Network(u) = {vi | link(u,vi )}  link(u,v) iff Strengthlink(u,v) =      |{i | Tagged(u, i, t)&Tagged(v,i, t)}| > threshold  Personalized score  Scoretj(i, u) = | Network(u) ∩{v | Tagged(v, i, tj)}| 5
  6. 6. Why peer-to-peer?  Exact  Inverted list per (user, tag)  Space intensive  Global Upper­Bound  Inverted list per tag Item i UB(i, tj) users(i)  Time consuming: Scoretj(i) at query time  User Clustering  Inverted list per (cluster, tag)   Trade­off between Exact and Global Upper­Bound 6
  7. 7. Scalability & Efficiency calls for Decentralization 7
  8. 8. Peer-to-Peer Solution  Goals  Top­k results as centralized network­aware search  Time consumption as Exact  Limited storage per user 8
  9. 9. Personal network discovery  Two­layer gossip protocol  Top layer: Personal network  Measure the similarities between users  Bottom layer: Random Peer Sampling  Keep the overlay connected   Feed top­layer with new related peers  Gossip framework  Peer selection  Data exchange  Data processing 9
  10. 10. Inverted lists & Query processing  Inverted list per (user, tag)  Inverted lists built and stored by concerned user  Q = (u, t1, ... , tn)  Query processed locally with NRA 10
  11. 11. NRA (No Random Access)  Score(i) = ∑ tj∈Q Scoretj(i)   Inverted lists are scanned sequentially in parallel  Information for each seen item:  Score lower­bound  Score upper­bound  Seen items are sorted on score lower­bound  Termination condition: lower­bound(k­th seen item) ≥   max{ upper­bound(i) | i∈{Seen items}/{top­k}} 11
  12. 12. Personal network optimization  At most n users per personal network  Derived from all users meeting the threshold  Strategies  Random  Biased Random:  plink(u, v)∝  Strengthlink(u, v)  Nearest: highest Strengthlink(u, v)  Nearest With Enhanced Link Strength  Strengthlink(u, v)=| { (i, tj) |  Tagged(u, i, tj) & Tagged(v, i, tj) } | 12
  13. 13. Evaluation  Implementation  PeerSim  Data  Real trace from del.icio.us  10,000 users  101,144 items  31,899 tags  9,536,635 tagging actions 13
  14. 14. Evaluation metrics  Convergence speed 1 ∣Current Network {u }∣  Speed =  ∑ ∣U∣ u∈ U ∣Network {u } of Centralized Setting∣  Top­k quality: Recall Number of Retrieved Relevant Items  Rk =  Total Number of Relevant Items  Storage space per user  Length of inverted lists  Length of neigbhors' profiles  Processing time  Number of sequential accesses 14
  15. 15. Convergence speed 15
  16. 16. Recall 16
  17. 17. Storage space  Inverted lists  Profiles 17
  18. 18. Processing time 18
  19. 19. Comparison of optimization 19
  20. 20. Scalability 20
  21. 21. Conclusion  Decentralization is the right way to provide scalable  personalized top­k processing   Future work  Churn  Privacy 21
  22. 22. Thank you! 22

×