Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Peer to Peer Information Retrieval


Published on

P2PIR is one of the an application of peer to peer network. P2PIR combines key elements of File Sharing and Federal Information Retrieval. No single technique is used for all P2PIR problem. Recall and Precision are used for Evaluation of P2PIR.
A field dealing with the structure, analysis, organization, storage, searching and retrieval of information is called information retrieval. And Searching in peer-to-peer networks
is called Peer to Peer Information Retrieval.

Published in: Education
  • Be the first to comment

Peer to Peer Information Retrieval

  1. 1. Peer to Peer Information Retrieval By, Chetan K. Sundarde @CHETANSUNDARDE 29-Oct-15 1P2PIR
  2. 2. Outlines :-  Peer to Peer Network  Information Retrieval  Peer to Peer Information Retrieval (P2PIR)  Peer to peer IR system architectures  Techniques used in IR in P2P networks  Basic algorithms used in P2PIR  Evaluation techniques used P2PIR  Challenges  Conclusion  References 29-Oct-15 2P2PIR
  3. 3. Peer To Peer Network  Collection of distributed system  Computers leave and join the network frequently  Each computer acts as a server and a client simultaneously  three tasks that every peer-to-peer network performs  Searching: Querying and getting list of document references.  Locating: Resolve a document reference to concrete location - full document  Transferring: download the document. 29-Oct-15 3P2PIR
  4. 4. Applications of P2P  Information Retrieval  File Sharing  Gnutella, Napster, Bit-torrent, etc. 29-Oct-15 4P2PIR
  5. 5. Information Retrieval :-  A field dealing with the structure, analysis, organization, storage, searching and retrieval of information is called information retrieval  Search relevant documents, on the basis of user input Document collection Info. need IR Retrieval 29-Oct-15 5P2PIR
  6. 6. Comparison between File Sharing and Information Retrieval File Sharing Information Retrieval Application Locating Searching Index -Content File Identifiers Document Content -Size Small Large Data Exchange -Unit File Search Result -Size Megabyte+ Kilobyte(small) 29-Oct-15 8 P2PIR- file sharing networks and federated information retrieval P2PIR
  7. 7. Peer to peer Information Retrieval (P2PIR)  Searching in peer-to-peer networks  Each peer shares its information with other peer  Peer searches information by sending queries to its peer  Routed to one or many other peers.  Query result is provide in the form of index 29-Oct-15 9P2PIR
  8. 8. Peer to peer IR system architectures  Based on relationship between peers: o Cooperative system o Uncooperative system  Based on the network structure o Centralized network o Structured architecture o Unstructured architecture  Based on task perform in P2P network o Centralized Global Index o Distributed Global Index o Strict Local Indices o Aggregated Local Indices 29-Oct-15 11P2PIR
  9. 9. Peer-to-Peer architectures used in IR 29-Oct-15 15 G G G G G G G G G G L L L L L L L L L L L L Central Global Index Distributed Global Index Aggregated Local Index Strict Local Index P2PIR
  10. 10. Algorithm used in P2PIR  Statistical IR algorithms  Vector Space Model (VSM) Document A: “books on computer networks” Document B: “network routing in P2P networks” Query Q: “computer network”  Each elements of the vector corresponds to the importance of the term in the document  Ranking of retrieved documents based Similarity between document vector and query vector book computer network routing vocabulary 0.5 0.5 0.8 0 VA 0 0 0.9 0.6 VB 0 0.5 0.8 0 VQ 0.89 0.72 29-Oct-15P2PIR 16
  11. 11. Algorithm used in P2PIR  Statistical IR algorithms  Latent Semantic Indexing (LSI) documents terms ….. V’a V’b semantic vectors SVD ….. SVD: singular value decomposition – Reduce dimensionality – Discover word semantics Cat <-> Pet Bus <-> Travel Va Vb 29-Oct-15 17 P2PIR
  12. 12. Algorithm used in P2PIR…  Distributed Hash Table (DHT)  method of hash table lookup over a decentralized distributed network  Key–value pairs are stored in  Kd=hash (“books on computer networks”)  Kq=hash (“computer network”)  the DHT at a parent node. (Structured Architecture)  Any node in the DHT can then efficiently retrieve the value by providing its key.  Napster and BitTorrent  modern DHTs are CAN, Chord, etc.  Extend with Content-Based Search  Full-Text Retrieval  Content-Based Image Retrieval  Content-Based Music Retrieval ,etc. 29-Oct-15 18P2PIR
  13. 13. P2P Information Retrieval Techniques Unstructured BFS, RBFS, Eg. Gnutella Blind Search Random Walk Blind Search Routing Indices Indexing Semantic Searching Eg. (SON) Clustering Structured pSearch Clustering 29-Oct-15 19P2PIR
  14. 14. Evaluation in P2P IR  Recall (Are all the relevant documents retrieved?)  fraction of the documents that are relevant to the query that are successfully retrieved  Recall = number of retrieved relevant in answer/ total number of relevant in the collection.  Precision (Are the retrieved documents relevant?)  fraction of documents retrieved that are relevant to a search query  Precision = number of retrieved relevant in answer/ number of retrieved Measure retrieved relevant Relevant Retrieved 29-Oct-15 20P2PIR
  15. 15. Evaluation Techniques in P2P IR…  F-Score / F-measure  Harmonic mean of precision and recall.  Hits per Query  average number of distinct relevant documents discovered per search query. 29-Oct-15 21P2PIR
  16. 16. Applications Of P2P Information Retrieval In Real World  YaCy (  local index entries are injected into a distributed global index  YaCy uses no centralized servers, but  The resulting decentralized web search currently has about 1.4 billion documents in its index and more than 600 peer operators contribute each month. About 130,000 search queries are performed with this network each day (Feb 2015)  Faroo (  This is a proprietary peer-to-peer search engine that uses a distributed global index.  They perform distributed crawling and ranking.  Faroo encrypts queries and results for privacy protection.  2 million peers.  Some other P2PIR system: Sixearch, ODISSEA, MINERVA, Seeks, etc. 29-Oct-15 22P2PIR
  17. 17. Challenges:-  Cross-Language Information Retrieval  Maintaining index freshness  Security features  Quality of service  Efficient use of resources  Increase range of peer-to-peer network 29-Oct-15 24P2PIR
  18. 18. Conclusion :-  P2PIR is one of the application of peer to peer network  P2PIR combines key elements of File Sharing and Federal Information Retrieval  No single technique is used for all P2PIR problem  Recall and Precision are used for Evaluation of P2PIR 29-Oct-15 25P2PIR
  19. 19. References  ALMER S. TIGELAAR, DJOERD HIEMSTRA and DOLF TRIESCHNIGG “Peer-to-Peer Information Retrieval ” University of Twente, IEEE PAPER SEPT 2012.  Rasanjalee Dissanayaka Mudiyanselage. “Ontology-based Search Algorithms over Large- Scale Unstructured Peer-to- Peer Networks.”Georgia State University, IEEE , OCT 2014  Demetrios Zeinalipour-Yazti . “Information Retrieval in Peer- to-Peer Systems .” UNIVERSITY OF CALIFORNIA RIVERSIDE, JUNE, IEEE 2003.  Chengye lu. “Peer to Peer English/Chinese Cross-Language Information Retrieval.”Queensland University of Technology, SEPT 2008. 29-Oct-15 26P2PIR
  20. 20. References  Xiuqi Li and Jie Wu “Searching Techniques in Peer-to-Peer Networks.” Florida Atlantic University Boca Raton, FL 33431, 2007  Christos Gkantsidis, Milena Mihail, and Amin Saberi. “Random Walks in Peer-to-Peer Networks.” Georgia Institute of Technology, Atlanta, GA, 2002.  Taoufik Yeferny, Amel Bouzeghoub and Khedija Arour. “A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERS.”International Journal of Advanced Information Technology (IJAIT) Vol. 1, No.6, December 2011  Yulian YANG . “Semantic Information Retrieval over P2P Networks.”Universit de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F- 69621, France, 2009. 29-Oct-15 27P2PIR
  21. 21. 29-Oct-15 28P2PIR