Research Issues in P2P Netwroks

998 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
998
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Research Issues in P2P Netwroks

  1. 1. How to Run Applications Faster ? Research Issues in P2P • There are 3 ways to improve performance: – Work Harder Computing – Work Smarter – Get Help • Computer Analogy – faster hardware high performance processors or peripheral devices – Optimized algorithms and techniques used to solve computational tasks – Multiple computers to solve a particular task Distributed…. OUTLINE • When a handful of powerful computers are• Centralized Vs. Distributed linked together and communicate with each• What is P2P? other• P2P Architectures – the overall computing power available can be• P2P and Applications amazingly vast.• Search and Replication Techniques – Such a system can have a higher performance share• P2P Security than a single supercomputer.• Emerging P2P Applications – The objective of such systems is to minimize• Conclusion communication and computation cost. Centralized?• Computation in networks of processing • Distributed system is an application that executes a nodes can be classified into centralized or distributed computations. collection of protocols to coordinate the actions of• A centralized solution relies on one node multiple processes on a communication network, being designated as the computer node that processes the entire application such that all components cooperate together to locally perform a single or small set of related tasks.• The central system is shared by all the users all the time.• There is single point of control and single point of failure. 1
  2. 2. Examples of Distributed Systems • The Internet• The collaborating computers can access remote – Heterogeneous resources as well as local resources in the network of computers distributed system via the communication network. and applications• The existence of multiple autonomous computers is – Implemented through the Internet transparent to the user in a distributed system. Protocol Stack – The user is not aware that the jobs are executed by multiple computers subsist in remote locations. – A centralized algorithm is at the heart of a single computer. – A distributed algorithm is at the heart of a society of computers Computer Networks vs. Distributed Systems Distributed…. • Distributed systems are built up on top of existing networking and operating systems software.• Computer Network: the autonomous computers are • The Middleware enables computers to coordinate their activities explicitly visible and to share the resources of the system – Middleware is the bridge that connects distributed applications across• Distributed System: existence of multiple dissimilar physical locations, with dissimilar hardware platforms, network autonomous computers is transparent technologies, operating systems, and programming languages. • Middleware provides standard services such as naming, concurrency• Many problems in common control, event distribution, authorization to specify access rights to resources, security etc.• Normally, every distributed system relies on services provided by a computer network. 2
  3. 3. Computing Platforms Evolution: Breaking Administrative Barriers Foster-Kesselman • The Foster-Kesselman duo organized in Ian Foster 1997, at Argonne National Laboratory, Mathematics and Computer a workshop entitled “Building a Science Division Computational Grid”. 2100 2100 2100 2100 Argonne National Laboratory Argonne, IL 60439P ? • At this moment the term “Grid” wasE born.R • The workshop was followed in 1998 by 2100 2100 2100 2100F 2100 Administrative BarriersOR the publication of the book “The Grid:M Individual Group Blueprint for a New ComputingAN Department Infrastructure” by Foster andC Campus Kesselman themselves. Carl KesselmanE State Information Sciences Institute National • For these reasons they are not only to University of Southern Globe Inter Planet be considered the fathers of the Grid California Universe but their book, which in the meantime Marina del Rey, CA 90292 was almost entirely rewritten and re- published in 2003, is also considered the Desktop SMPs or Local Enterprise Global Inter Planet “Grid bible”. (Single Processor) SuperCom Cluster Cluster/Grid puters Cluster/Grid Cluster/Grid ?? The Need for Collaboration? Electric Grid and Grid Computing • Computing grids are conceptually not unlike • The worldwide business demands intense electrical grids. problem-solving capabilities for incredibly • Electric power grid - a variety of resources contribute power into a shared "pool" for many complex problems consumers to access on an as-needed basis. – the need for dynamic collaboration of many – In an electrical grid, wall outlets allows us to link to an infrastructure of resources that computing resources to be able to work together. generate, distribute, and bill for electricity. • This is a difficult challenge across all the technical – When you connect to the electrical grid, you don‟t need to know where the power plant is communities to achieve this level of resource or how the current gets to you. collaboration within the bounds of the necessary • Grid computing uses middleware to coordinate disparate IT resources across a network, quality requirements of the end user. allowing them to function as a virtual whole. – The goal of a computing grid, like that of the electrical grid, is to provide users with access to the resources they need, when they need them. Why Grids ? Large Scale Exploration needs them Solving technology problems using computer modeling, simulation and analysis Geographic Information SystemsLife Sciences Aerospace CAD/CAM Military Applications 3
  4. 4. CERN’s Large Hadron Collider Client-Server Model1800 Physicists, 150 Institutes, 32 Countries The most widely used Client invocatio n Server invocatio n result result Server Client 100 PB of data by 2010; 50,000 CPUs? Key: Process: Computer: SourceThe Large Hadron Collider (LHC) Router A gigantic scientific instrument near Geneva It is a particle accelerator used by physicists to study the smallest known “Interested” particles – the fundamental building blocks of all things. End-host Client-Server Why P2P? Source Router “Interested” End-host 4
  5. 5. Client-Server Why P2P? Overloaded! Personal Computers 80% idle CPU time Internet Laptop 90% idle CPU time Source Computers in our Lab Router 99% idle CPU time ! Hot Spots become hotter“Interested” End-host What is driving P2P? Problem with Client-Server Model • Clients are not so dumb. – Scalability • Billions of Mhz CPU, tons of terabytes • As the number of users increases, there is a higher disk, millions of gigabits network demand for computing power, storage space, and bandwidth, … bandwidth associated with the server-side – Reliability – Unused resources. • The whole network will depend on the highly loaded server to function properly Computer System Taxonomy P2P – An overlay network • P2P overlay network C Computer Systems – The connected nodes E construct a virtual overlay Centralized Systems network on top of the F Distributed Systems (mainframes, SMPs) underlying network infrastructure B Client - server C Peer- to- Peer – Peer-to-peer network E topology is a virtual overlay A at application layer F G B D 30 5
  6. 6. Typical Characteristics • Large Scale: lots of nodes (up to millions) Internet Client Client Cache • Dynamicity: frequent joins, leaves, failures Client Proxy Client • Little or no infrastructure Client server server Client Peer-to-peer model – No central server Congestion zone • Symmetry: all nodes are “peers” – have same role Client Client/ Client/ Client/ Client Server Server Client Server Client Client/ Server Client/ Client/server model server server Server Client/ Congestion zone Client/ Server Server Client/ Server Client/ Server What is it... P2P Dominates Internet Traffic• P2P computing is the sharing of computer resources and services by direct exchange between systems.• These resources and services include the exchange of • P2P has dominated Internet traffic information, processing cycles, cache storage, and disk storage In 2006, more than 60% of Internet traffic for files.• P2P computing takes advantage of existing desktop computing power and networking connectivity, – allowing economical clients to leverage their collective power to benefit the entire enterprise.• In a P2P architecture, computers that have traditionally been used solely as clients communicate directly among themselves and can act as both clients and servers, assuming whatever role is most efficient for the network.• Each node (peer) called servent acts as both a SERVer and a cliENT Shared folder, neighbors Client and server Some Statistics about P2P Systems Peer Peer • More than 200 million users registered with skype, Peer around 10 million on-line users. (2007) Search Peer Peer • Around 4.7M hosts participate SETI@Home (2006) Peer • BT accounts for 1/3 of Internet traffic (2007) • More than 200,000 simultaneous online users on PPLive Retrieve (streaming video network). (2007) File Peer Peer • More than 3,000,000 users downloaded PPStream. (2008) Peer Peer Peer 36 6
  7. 7. P2P Applications • In Peer-to-Peer (P2P) computing, applications are segregated into three main categories: – distributed computing, – file sharing, and – collaborative applications • The three categories of P2P serve different purposes – Distributed computing applications typically require the decomposition of larger problem into smaller parallel problems – File sharing applications require efficient search across wide area networks and – Collaborative applications require update mechanisms to provide consistency in multi-user environment P2P Network Architectures P2P Computing• Centralized (Napster) • File sharing (e.g.,• Decentralized Gnutella, Freenet, Communication and collaboration Groove Skype – Unstructured (Gnutella) Limewire, KaZaA) – Structured (Chord) • Collaboration (e.g., Magi, Groove, Jabber) Napster• Hierarchical (MBone) • Distributed computing Gnutella Kazaa Freenet File sharing• Hybrid (EDonkey) (e.g., SETI@home, Overnet Search for SETI@Home Extraterrestrial folding@Home Intelligence) Distributed computing Computer Systems Centralized Systems Distributed Systems(mainframes, SMPs, workstations) Client - server Peer-to-Peer P2P FILE SHARING APPLICATIONS Centralized Decentralized Structured Unstructured 7
  8. 8. P2P Applications Napster: Example• File sharing (music, movies, …) – utilise the idle disk space for storage and the existing m5 network bandwidth for search and download. m6 E – The cost of operation is very low F m1 A D • majority of peers collect only objects that they are E? E m2 B m4 interested in anyway. m3 m4 C D E? m5 E – Eg: Napster, KaZaA and Gnutella m5 m6 F C A m3 m1 m2 File Sharing Services Unstructured P2P• Publish – insert a new file into the network Flooded to connected peers Flooded between supernodes• Lookup – given a file name X, find the host that stores the file• Retrieval – get a copy of the file search transfer supernode• Join – join the network 2.query• Leave – leave the network – Neighbors 1.query peer node Centralized P2P File Sharing: Gnutella• Utilize a central directory for object location • Gnutella is a file sharing protocol• For file-sharing P2P, location inquiry Centralized Server form central servers then downloaded directly from peers • Gnutella was originally designed by Nullsoft, a• Benefits – Simplicity subsidiary of America Online.• – Limited bandwidth usage Drawbacks 1. query • Its architecture is completely decentralised and – Unreliable (single point of failure), performance bottleneck, and upload indexes distributed scalability limits – Vulnerable to DoS attacks 2. response • When a client wishes to connect to the network – Copyright infringement they run through a list of nodes that are most likely to be up or take a list from a website and then 3. transfer connect to how ever many nodes they want 8
  9. 9. Gnutella Search Mechanism Peer-to-Peer File Sharing is all about the trading of copyrighted music and videos without paying anything to the authors  Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A,B,C,D,E,F are resources  TTL m5 query E music m6 category F D E E? E? m4 KaZaA E? Native Windows E? Application C A banner B m3 m1 ad m2 3 million users online sharing 4 PetaBytes of data• Advantages – Fast lookup – Low join and leave overhead – Popular files are replicated many times, so lookup with small TTL will usually find the file • Can choose to retrieve from a number of sources Searching• Disadvantages – Not 100% success rate, since TTL is limited – Very high communication overhead – Uneven load distribution Kazaa Search in Unstructured P2P Two general types of search in unstructured p2p: Blind: try to propagate the query to a sufficient number of nodes (example Gnutella) Informed: utilize information about document locations Sharman Networks Kazaa is a file sharing program that allow you to download audio,video, images, documents and software files. 9
  10. 10. Blind Search Methods APS – an example BFS and Random Walk Node J holds the requested object Nodes deploy 2 walkers, initially All index values are 20 TTL=3• BFS Random walks •In unstructured networks, flooding would exhaust bandwidth of network. Collaborative Community Informed search • Rapidly changing work environment – Out-sourcing, in-sourcing, home-sourcing – Tight integration and team work with customers,  Informed: utilize information about document partners, vendors locations. • P2P allows management of documents at level of closed working groups.  APS • The collaboration software is designed to improve the productivity of individuals with common goals or interests. • Groove is a collaborative P2P system (http://www.groove.net) – Part of the Microsoft Office system – Document sharing and collaboration – • vital for a business. – Office Groove 2007 is a collaboration software program • helps teams work together dynamically and effectively, even if team members work for different organizations, or work remotely. Work Together: Anyone, Anytime, Anyplace Microsoft Office Groove 2007 Adaptive Probabilistic Search• Each node keeps a local index Example (indices at node A) consisting of one entry for each object it has requested per neighbor. A chooses B with Pr=0.3• Index values represent the A chooses C with Pr =0.5 probability of finding that object A chooses D with Pr=0.2 through that neighbor• Searching is based on the simultaneous deployment of k walkers and probabilistic forwarding.• if a hit occurs, the walker terminates successfully.• On a miss, the query is forwarded to one of the node‟s neighbors. 10
  11. 11. Distributed Computing: SETI@home Search for Extraterrestrial Intelligence -if we are alone in the universe or whether there is intelligent life somewhere else in the Universe. Over two million computers crunching away and downloading data gathered from the Arecibo radio telescope in Puerto Rico, USA The SETI@Home project is widely regarded as the fastest computer in the world Sharing of resources such as computation power, network bandwidth and storage Achieves computing power cheaper than a supercomputer can provide. Developed by the Space Sciences Laboratory, at the University of California, Berkeley, in the United States.http://setiathome.ssl.berkeley.edu Launched in 1996 How SETI@home works?  Collect data source  Use telescope to collect data source from outer space at Arecibo.  The SETI@home use data recorder to record data source on removable tape.  Distribution of data source  SETI@home divide data into fixed-size work units.  SETI@home distribute these data via Internet from the servers to a client program.  Client program computes result ,then returns it to the server, and gets another work unit. How SETI@home works? …• Scientific experiment - uses Internet-connected computers• Distributes a screen saver–based application to users• Applies signal analysis algorithms different data sets to process radio-telescope data.• Has more than 3 million users 3. SETI client gets data from server and runs Main Server 4. Client sends results back to server Radio-telescope Data 2. SETI client (screen Saver) starts 11
  12. 12. Super nodes• “… a free program that uses the latest P2P…technology to • Super nodes are Skip clients run by users that have a bring affordable and high quality voice communications to people “good” Internet connection and a “good” computer. all over the world…” • Having a good Internet connection means having a public• Skype offers voice, video, chat and data transfer IP address, without firewall restrictions. services over IP • A good computer is a machine that can forward other• The first stable version of Skype has been released in July users‟ communications and handle many connections. 2004, since then the number of users kept on growing. • SN have a role of relay in the network• Nowadays Skype claims having more than 20 millions – Hence, they need a better connectivity and better performances. accounts and between 4 and 6 millions of users • 1 SN are used to connect SC together. simultaneously connected. Skype Skype Software features Skype – login• VoIP from computer to computer – The most used feature especially. • Skype clients directly connect to login• VoIP from computer to regular phone (Skype Out) – By registering on Skype‟s website it is possible buy credit and then call all over servers, whose IP addresses are hard the world with very interesting rates compared to rates applied by phone companies. coded within the software.• Video conferencing Introduced in Skype2.0 in 2006. – In this connection the login name and• Instant Messaging This feature is comparable to many other the version are sent in clear text format. instant messaging clients like MSN Messenger, Yahoo! Messenger, Google Talk, etc. – The main difference is that Skype does not tell the user whether the person he • The login server stores all of user is chatting with is typing or not. This is due to the P2P design of the Skype network. names and passwords and ensures• File Transfer that names are unique across the – The Skype network design has a big influence on the quality of file transfers. It can make it very fast (1Mbps) or very slow (3 kbps). Skype name space Internet Telephony - Skype• The participants form a self-organizing • Connection to a bootstrap node P2P overlay network to locate and – When SC (Skype Client) is installed the first time it communicate with other participants. come with a list of SN to connect to.• The bandwidth is shared and the sound or video in real time is shared as resource – First, the Skype Client tries to connect to 5 SN sending• Skype has a similar architecture as its a UDP packet to IP addresses of super nodes predecessor KaZaA randomly chosen in the host cache.• There are three types of nodes in the – When the client finds a super node to connect to, it Skype network: refreshes its list of active and available super nodes in – Ordinary-peers host cache. – Super-nodes – Central login server – SC connects to a SN• Communications are encrypted (RSA) 12
  13. 13. Traffic volume content type (Germany, BitTorent) Skype - user search • Similar to KaZa (searching for callee) • Client sends an user name to SN and as an answer receives few IP addresses and port numbers • Subsequently the client contacts these nodes • If it cannot find the user it sends request to its SN once again and as a result receives another few IP addresses and port numbers • The process continues until the user is found What is PPLive? Skype - call establishment What is PPLive? – An online video broadcasting and advertising • Routing in the Skype overlay network is done by network • Provides an online viewing experience the SN. comparable to that of traditional TV broadcasting • 75 million global installed base and 20 • When a SC tries to establish a call, it first ask its million monthly active users • 600+ channels on PPLive with content SN (if it is not a SN itself) where is the callee and ranging from news, music, sports, movies, tries to connect directly to it. games, live video and other interactive services to a global audience – An efficient P2P technique platform and test – If the SC is restricted because of firewall then it will bench connect to the callee using a SN as a relay. History of PPLive: – If both a caller and a callee have public IP addresses, a • Bill’s story – Inventor of PPLive core technology caller sends signaling information over TCP to a callee – Dropped out of post-graduate program to start PPLive P2P VIDEO STREAMING PPLIVE• Streaming video is content sent in compressed form over the Internet and displayed by the viewer in real time.• With streaming video or streaming media, a Web user does not have to wait to download a file to play it - the media is sent in a continuous stream of data and is played as it arrives.• The user needs a player, which is a special program that uncompresses and sends video data to the display and audio data to speakers.• A player can be either an integral part of a browser or downloaded from the software makers Web site.• P2P streaming – P2P TV • PPLive, PPStream, Joost (by Skype founders), … 13
  14. 14. Streaming Tree Reconstruction after a Peer Industry Trends DeparturePPLive is well positioned to exploit the next explosive growth Advanced Video Streaming PPLive Applications VOIP Skype Downloading BitTorrent File Sharing Basic Napster Applications 2001 2003 2004 2005 PPLive Multi-tree Streaming Media Server (channel management server) - Retrieve list of channels via HTTP Membership Server -Retrieve small list of members nodes of interest via UDP Since all peers are involved in the data distribution, the load is spread among all nodes. Single-tree Streaming A snapshot of a tree-based overlay with 231 nodes• A common approach to P2P streaming is to organize participating peers into a single tree-structured overlay – The content is pushed from the source towards all peers. – This way organizing peers is called single-tree streaming.• In these systems, peers are hierarchically organized in a tree structure where the root is the stream source.• The content is spread as a continuous flow of information from the source down to the tree. 14
  15. 15. Overall Architecture Web Server Tracker Bit Torrent •Created by Brahm Cohen in 2001 C A Peer Peer [Seed] B [Leech] Downloader Peer “US” [Leech] What is BitTorrent? Overall Architecture• A peer-to-peer file transfer protocol Tracker• Extremely popular today Web Server• “Pull-based” “swarming” approach• Each file split into smaller pieces• Nodes request desired pieces from neighbors• As opposed to parents pushing data C that they receive A• Pieces not downloaded in sequential Peer order Peer [Seed] B• Encourages contribution by all nodes [Leech] Downloader Peer “US” [Leech] Overall Architecture Overall Architecture Web Server Tracker Web Server Tracker C C A A Peer Peer Peer [Seed] Peer [Seed] B B [Leech] [Leech] Downloader Peer Downloader Peer “US” [Leech] “US” [Leech] 15
  16. 16. Overall Architecture BitTorrent Lingo Web Server Tracker Seeder = a peer that provides the complete file. Initial seeder = a peer that provides the initial copy. Leecher Initial seeder One who is downloading C A Peer Peer [Seed] Leecher B [Leech]Downloader Peer Seeder “US” [Leech] Overall Architecture BitTorrent Basics Web Server Tracker • Files are broken into pieces. – Users each download different pieces from the original uploader (seed). – Users exchange the pieces with their peers to obtain the ones they are missing. A C • This process is organized by a centralized server Peer called the Tracker. Peer [Seed] B [Leech]Downloader Peer “US” [Leech] Overall Architecture Critical Elements Web Server Tracker • A web server – stores and serves the .torrent file. – For example: • http://bt.btchina.net Web Server • http://bt.ydy.com/ C A Peer The Lord of Ring.torrent Peer [Seed] B [Leech] Troy.torrentDownloader Peer “US” [Leech] 16
  17. 17. BitTorrent Swarm Critical Elements • Swarm • The .torrent file – Set of peers all downloading the same file – Static „metainfo‟ file to contain necessary – Organized as a random mesh information : • Each node knows list of pieces downloaded by neighbors • URL of tracker • Node requests pieces it does not own from neighbors • Piece length – Usually 256 KB Matrix.torrent ------------------------------------------------- • SHA-1 hashes of each piece in file • swarm • IP address of the Tracker – The group of machines that are collectively connected for a particular file. • For example, if you start a BitTorrent client and it tells you that youre connected to 10 peers and 3 seeds, then the swarm consists of you and those 13 other people. How a node enters a swarm Critical Elements for file “popeye.mp4” • A BitTorrent tracker – The tracker maintains information about all BitTorrent • File popeye.mp4.torrent clients utilizing each torrent. hosted at a (well-known) – The tracker identifies the network location of each client webserver either uploading or downloading the P2P file associated with • The .torrent has address of a torrent. tracker for file – It also tracks which fragment(s) of that file each client possesses, to assist in efficient data sharing between clients. • The tracker, which runs on a • i.e. the tracker keeps track of all peers downloading file webserver as well, keeps For example: track of all peers • http://bt.cnxp.com:8080/announce downloading file • http://btfans.3322.org:6969/announce Critical Elements How a node enters a swarm for file “popeye.mp4”• An end user (peer) www.bittorrent.com – Guys who want to use BitTorrent must install • File popeye.mp4.torrent corresponding software or plug-in for web browsers. hosted at a (well-known) 1 – Downloader (leecher) : Peer has only a part ( or none ) of webserver the file. Peer • The .torrent has address of tracker for file – Seeder: Peer has the complete file, and chooses to stay • The tracker, which runs on a in the system to allow other peers to download webserver as well, keeps – BitTorrent clients connect to a tracker when attempting track of all peers to work with torrent files. downloading file • The tracker notifies the client of the P2P file location (that is normally on a different, remote server). 17
  18. 18. How a node enters a swarm Three elements necessary to sharing a file for file “popeye.mp4” with BitTorrent www.bittorrent.com • The tracker - coordinates connections among the peers. – Tracker doesnt know anything of the actual contents of a file • File popeye.mp4.torrent – Generally, its considered good manners to continue seeding a file after you hosted at a (well-known) have finished downloading, to help out others. webserver • The web server - stores and serves the .torrent file. 2 • The .torrent has address of • At least one seederPeer – Contains any of the files actual contents. tracker for file – The seeder is almost always an end-users desktop machine (peer), rather Tracker • The tracker, which runs on a than a dedicated server machine. webserver as well, keeps – Seeding is monitored by the Tracker – Seed your file for a long time to prevent peers from being left with track of all peers incomplete files. downloading file • When you finish a download in BitTorrent, and you are only uploading, youre seeding! How a node enters a swarm File sharing for file “popeye.mp4” www.bittorrent.com Large files are broken into pieces of size between • File popeye.mp4.torrent hosted at a (well-known) 64 KB and 1 MB webserverPeer • The .torrent has address of tracker for file 3 Tracker • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 1 2 3 4 5 6 7 8 Swarm BT: publishing a file A trivial example {1,2,3,4,5,6,7,8,9,10} Harry Potter.torrent Bob User Seeder: John Web Server {} {1,2,3} Tracker {1,2,3,5} {} {1,2,3} {1,2,3,4} {1,2,3,4,5} User Downloader: Seeder: Downloader: User Downloader A B C Downloader Joe Fan Bin 18
  19. 19. Types of P2P Attacks P2P Technical Challenges • Poisoning: a client can provide content that doesn‟t match the description. • Routing protocols – A client A, can broadcast a message saying it needs file • Network topologies „X‟. A malicious client can send a message back to A • Peer discovery saying it has file X, then send it file Y. • Communication/coordination protocols • Denial of Service attacks that decrease or cease • Quality of service total capable network activity. • Security • Defection attacks which allow a client to participate on the network with a very low upload-to- download ratio. Types of P2P Attacks…. P2P SECURITY • Virus attacks, where a malicious client can add viruses into files shared on the network. • Malware attacks, where the P2P software Security is the condition of being protected contains spyware. against danger or loss. • Filtering attacks, where network operators may attempt to prevent P2P network data from being carried out. P2P Security Attacks On & From• P2P file sharing networks are constantly under • Attacks on P2P systems: attack.• P2P is potentially more vulnerable than client server. – Decentralized – More difficult to manage and control • Attacks from P2P Systems:• Need to understand the security issues for architecting future P2P apps 111 114 19
  20. 20. Attacks on P2P sharing File Pollution Two types: Unsuspecting users Alice spread pollution ! • Pollution: file corruption  File Content • Index poisoning  File Index 115 Bob 118 File Pollutionoriginal content polluted content Unsuspecting users spread pollution ! pollution company Yuck File Pollution 116 119 File Pollution INDEX POISONING • Aim of the attacker is to make several peers believe that some popular file is present with the victim. • Attacker sends a location publish pollution message to every crawled peer. server • In this message, the attacker includes victim‟s IP address and port number. pollution • Attacker puts the file hash of a popular company file along with the message. file sharing • Peer B adds this file hash into it along network with the location of the victim. pollution pollution • When a peer C searches for that file, it server server may be told by some poisoned peer that victim has the file. pollution server 117 20
  21. 21. Index Poisoning Free Riding • Peers share little or no data in P2P file-sharing systems index 23.123.78.6 title location • Measurement bigparty 123.12.7.98 smallfun 23.123.78.6 – Nearly 70% of Gnutella users share no files 123.12.7.98 heyhey 234.8.89.20 – Nearly 50% of all responses are returned by the top 1% of sharing hosts file sharing • Incentive mechanisms to encourage user network cooperation 234.8.89.20 121 Index Poisoning P2P Worms index 23.123.78.6 title location bigparty 123.12.7.98 Topological Passive 123.12.7.98 smallfun 23.123.78.6 heyhey 234.8.89.20 Scan Worms Worms bighit 111.22.22.22 A computer worm is a self-replicating malware computer program. 234.8.89.20 It uses a computer network to send copies of itself to other nodes 111.22.22.22 It may do so without any user intervention. 122 ROUTING TABLE POISONING TOPOLOGICAL WORM ATTACK• The aim of the attacker is to make the peers add victim as their neighbors• Attacker sends node announcement messages to every crawled peer.• Attacker includes victim‟s IP address and port number in these messages• The peers add victim as their neighbor• Query messages are forwarded to the victim 21
  22. 22. TOPOLOGICAL WORM ATTACK Effects • Eating up free disk space • Benjamin opens a Web page, called benjamin.xww.de to display banner ads. – One day morning the Benjamin.xww.de Web site had a message saying: "Domain closed due to massive abuse." PASSIVE P2P WORMS• Vulnerability in the protocol• Wait for the vulnerable targets to contact them• Case 1 – Worm can create infected copies of itself with attractive filenames and place them in the shared folder of the P2P client or will replace the files present in the shared folder with itself How vulnerable is BitTorrent? – e.g. VBS.Gnutella, Benjamin Worm etc.• Case 2 – Answers positively to a proportion of search queries by changing the name of the corrupted file to match the search query – e.g. Gnuman 131 P2P-Worm.Win32.Benjamin.a Pollution Attack • P2P-Worm.Win32.Benjamin.a (Kaspersky Lab) is also known as: Worm.P2P.Benjamin.a (Kaspersky Lab), • 1. The peers W32/Benjamin.worm (McAfee), receive the peer W32.Benjamin.Worm (Symantec), Win32.HLLW.Benjamin (Doctor Web) list from the • This worm uses the Kazaa file exchange P2P network tracker. to spread itself. • Benjamin is written in Borland Delphi and is approximately 216 Kb in size - it is compressed by the AsPack utility. 22
  23. 23. Pollution Attack DDOS Attack• 2. One peer • DDOS = Distributed denial of service contacts the • Based on the fact the BitTorrent Tracker has no attacker for a mechanism for validating peers. chunk of the file. • Uses modified client software Pollution Attack DDOS Attack• The attacker sends • 1. The attacker back a false downloads a large chunk. number of torrent• This false chunk files from a web will fail its hash server. and will be discarded. Pollution Attack DDOS Attack• 4. Attacker • 2. The attacker parses requests all chunks the torrent files with a modified BitTorrent from swarm and client and spoofs his IP wastes their address and port upload bandwidth. number with the victims as he announces he is joining the swarm. 23
  24. 24. Current Solutions: Pollution DDOS Attack Attacks• 3. As the tracker • Blacklisting receives requests for a – Achieved using software such as Peer Guardian or list of participating moBlock. peers from other – Blocks connections from blacklisted IPs which are clients it sends the downloaded from an online database. victims IP and port number. Solutions – TRUST and REPUTATION DDOS Attack • Most of the solutions proposed to solve the problem of attacks are • 4. The peers then based on building trust (and/or reputation) between attempt to the peers connect to the • Some of the popular approaches are: – DCRS - Bit Torrent victim to try and – EigenTrust download a chunk – XRep of the file. • These approaches do slow down the attack Attack illustration What is Trust? What is reputation? • Trust – a peer‟s belief in another peer‟s capabilities, honesty victim and reliability based on its own experiences. • Reputation – a peer‟s belief in another peer‟s capabilities, Who has the files? honesty and reliability based on recommendations received Tracker from other peers. clients – Reputation can be centralized, computed by a third party or it can Discussion be decentralized, computed independently by each other after forum asking other peers recommendations. Victim has the files! .torrent .torrent .torrent .torrent .torrent .torrent attacker 24
  25. 25. What is Trust? …….. An Example Trust Management System• Both Trust and Reputation are used to evaluate a peer‟s (BitTorrent) trustworthiness.• Trust and Reputation increase or decrease with further • Debit-Credit Reputation System experience. • Each client calculates a local trust• Trust and reputation both depend on some context. score for their peers Based on valid pieces uploaded /downloaded• For example: • Tracker combines these individual – Mike trusts John as his doctor, but he doesn‟t trust John as a scores to make a global score mechanic who can fix his car. • In the context of seeing a doctor, John is trustworthy • In the context of fixing a car, John untrustworthy. DCRS… …(cont’d) What is Trust Management ? Local Trust Score Computation • “Trust Management” was first coined by Blaze et. al 1996 Fij=Uij-Dij, Uij – the number of chunks that i uploaded to j, – a coherent framework for the study of security Dij- the number of chunks that i downloaded from j policies, security credentials and trust Using Fij, the local trust score LTij is computed as relationships. -1 if bogus chunk is uploaded by peer j 0 if Fij >t 1 if Fij <= t, where „t‟ is the fairness threshold Reputation Management DCRS… …(cont’d)• Need for trust mechanisms Global Trust Score Computation – To assess trustworthiness of peers and the content • Global Trust Scores are a representation the rest of the • Malicious peers generate unlimited number of inauthentic swarms opinion of a peer. files • At regular interval the tracker receives the local trust – To deter malicious behavior scores of peers in the swarm.• Reputation is an assumption that past behavior is • The tracker chooses „k‟ , where „k‟ is < the number of indicative of future behavior peers in the swarm, random local trust scores for peer j in the swarm.• Use of reputation to build trust • Tracker uses k local trust scores for peer j and sets the average of them as the global trust score for j 25

×