Your SlideShare is downloading. ×
Research Issues in P2P Netwroks
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Research Issues in P2P Netwroks

762
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
762
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. How to Run Applications Faster ? Research Issues in P2P • There are 3 ways to improve performance: – Work Harder Computing – Work Smarter – Get Help • Computer Analogy – faster hardware high performance processors or peripheral devices – Optimized algorithms and techniques used to solve computational tasks – Multiple computers to solve a particular task Distributed…. OUTLINE • When a handful of powerful computers are• Centralized Vs. Distributed linked together and communicate with each• What is P2P? other• P2P Architectures – the overall computing power available can be• P2P and Applications amazingly vast.• Search and Replication Techniques – Such a system can have a higher performance share• P2P Security than a single supercomputer.• Emerging P2P Applications – The objective of such systems is to minimize• Conclusion communication and computation cost. Centralized?• Computation in networks of processing • Distributed system is an application that executes a nodes can be classified into centralized or distributed computations. collection of protocols to coordinate the actions of• A centralized solution relies on one node multiple processes on a communication network, being designated as the computer node that processes the entire application such that all components cooperate together to locally perform a single or small set of related tasks.• The central system is shared by all the users all the time.• There is single point of control and single point of failure. 1
  • 2. Examples of Distributed Systems • The Internet• The collaborating computers can access remote – Heterogeneous resources as well as local resources in the network of computers distributed system via the communication network. and applications• The existence of multiple autonomous computers is – Implemented through the Internet transparent to the user in a distributed system. Protocol Stack – The user is not aware that the jobs are executed by multiple computers subsist in remote locations. – A centralized algorithm is at the heart of a single computer. – A distributed algorithm is at the heart of a society of computers Computer Networks vs. Distributed Systems Distributed…. • Distributed systems are built up on top of existing networking and operating systems software.• Computer Network: the autonomous computers are • The Middleware enables computers to coordinate their activities explicitly visible and to share the resources of the system – Middleware is the bridge that connects distributed applications across• Distributed System: existence of multiple dissimilar physical locations, with dissimilar hardware platforms, network autonomous computers is transparent technologies, operating systems, and programming languages. • Middleware provides standard services such as naming, concurrency• Many problems in common control, event distribution, authorization to specify access rights to resources, security etc.• Normally, every distributed system relies on services provided by a computer network. 2
  • 3. Computing Platforms Evolution: Breaking Administrative Barriers Foster-Kesselman • The Foster-Kesselman duo organized in Ian Foster 1997, at Argonne National Laboratory, Mathematics and Computer a workshop entitled “Building a Science Division Computational Grid”. 2100 2100 2100 2100 Argonne National Laboratory Argonne, IL 60439P ? • At this moment the term “Grid” wasE born.R • The workshop was followed in 1998 by 2100 2100 2100 2100F 2100 Administrative BarriersOR the publication of the book “The Grid:M Individual Group Blueprint for a New ComputingAN Department Infrastructure” by Foster andC Campus Kesselman themselves. Carl KesselmanE State Information Sciences Institute National • For these reasons they are not only to University of Southern Globe Inter Planet be considered the fathers of the Grid California Universe but their book, which in the meantime Marina del Rey, CA 90292 was almost entirely rewritten and re- published in 2003, is also considered the Desktop SMPs or Local Enterprise Global Inter Planet “Grid bible”. (Single Processor) SuperCom Cluster Cluster/Grid puters Cluster/Grid Cluster/Grid ?? The Need for Collaboration? Electric Grid and Grid Computing • Computing grids are conceptually not unlike • The worldwide business demands intense electrical grids. problem-solving capabilities for incredibly • Electric power grid - a variety of resources contribute power into a shared "pool" for many complex problems consumers to access on an as-needed basis. – the need for dynamic collaboration of many – In an electrical grid, wall outlets allows us to link to an infrastructure of resources that computing resources to be able to work together. generate, distribute, and bill for electricity. • This is a difficult challenge across all the technical – When you connect to the electrical grid, you don‟t need to know where the power plant is communities to achieve this level of resource or how the current gets to you. collaboration within the bounds of the necessary • Grid computing uses middleware to coordinate disparate IT resources across a network, quality requirements of the end user. allowing them to function as a virtual whole. – The goal of a computing grid, like that of the electrical grid, is to provide users with access to the resources they need, when they need them. Why Grids ? Large Scale Exploration needs them Solving technology problems using computer modeling, simulation and analysis Geographic Information SystemsLife Sciences Aerospace CAD/CAM Military Applications 3
  • 4. CERN’s Large Hadron Collider Client-Server Model1800 Physicists, 150 Institutes, 32 Countries The most widely used Client invocatio n Server invocatio n result result Server Client 100 PB of data by 2010; 50,000 CPUs? Key: Process: Computer: SourceThe Large Hadron Collider (LHC) Router A gigantic scientific instrument near Geneva It is a particle accelerator used by physicists to study the smallest known “Interested” particles – the fundamental building blocks of all things. End-host Client-Server Why P2P? Source Router “Interested” End-host 4
  • 5. Client-Server Why P2P? Overloaded! Personal Computers 80% idle CPU time Internet Laptop 90% idle CPU time Source Computers in our Lab Router 99% idle CPU time ! Hot Spots become hotter“Interested” End-host What is driving P2P? Problem with Client-Server Model • Clients are not so dumb. – Scalability • Billions of Mhz CPU, tons of terabytes • As the number of users increases, there is a higher disk, millions of gigabits network demand for computing power, storage space, and bandwidth, … bandwidth associated with the server-side – Reliability – Unused resources. • The whole network will depend on the highly loaded server to function properly Computer System Taxonomy P2P – An overlay network • P2P overlay network C Computer Systems – The connected nodes E construct a virtual overlay Centralized Systems network on top of the F Distributed Systems (mainframes, SMPs) underlying network infrastructure B Client - server C Peer- to- Peer – Peer-to-peer network E topology is a virtual overlay A at application layer F G B D 30 5
  • 6. Typical Characteristics • Large Scale: lots of nodes (up to millions) Internet Client Client Cache • Dynamicity: frequent joins, leaves, failures Client Proxy Client • Little or no infrastructure Client server server Client Peer-to-peer model – No central server Congestion zone • Symmetry: all nodes are “peers” – have same role Client Client/ Client/ Client/ Client Server Server Client Server Client Client/ Server Client/ Client/server model server server Server Client/ Congestion zone Client/ Server Server Client/ Server Client/ Server What is it... P2P Dominates Internet Traffic• P2P computing is the sharing of computer resources and services by direct exchange between systems.• These resources and services include the exchange of • P2P has dominated Internet traffic information, processing cycles, cache storage, and disk storage In 2006, more than 60% of Internet traffic for files.• P2P computing takes advantage of existing desktop computing power and networking connectivity, – allowing economical clients to leverage their collective power to benefit the entire enterprise.• In a P2P architecture, computers that have traditionally been used solely as clients communicate directly among themselves and can act as both clients and servers, assuming whatever role is most efficient for the network.• Each node (peer) called servent acts as both a SERVer and a cliENT Shared folder, neighbors Client and server Some Statistics about P2P Systems Peer Peer • More than 200 million users registered with skype, Peer around 10 million on-line users. (2007) Search Peer Peer • Around 4.7M hosts participate SETI@Home (2006) Peer • BT accounts for 1/3 of Internet traffic (2007) • More than 200,000 simultaneous online users on PPLive Retrieve (streaming video network). (2007) File Peer Peer • More than 3,000,000 users downloaded PPStream. (2008) Peer Peer Peer 36 6
  • 7. P2P Applications • In Peer-to-Peer (P2P) computing, applications are segregated into three main categories: – distributed computing, – file sharing, and – collaborative applications • The three categories of P2P serve different purposes – Distributed computing applications typically require the decomposition of larger problem into smaller parallel problems – File sharing applications require efficient search across wide area networks and – Collaborative applications require update mechanisms to provide consistency in multi-user environment P2P Network Architectures P2P Computing• Centralized (Napster) • File sharing (e.g.,• Decentralized Gnutella, Freenet, Communication and collaboration Groove Skype – Unstructured (Gnutella) Limewire, KaZaA) – Structured (Chord) • Collaboration (e.g., Magi, Groove, Jabber) Napster• Hierarchical (MBone) • Distributed computing Gnutella Kazaa Freenet File sharing• Hybrid (EDonkey) (e.g., SETI@home, Overnet Search for SETI@Home Extraterrestrial folding@Home Intelligence) Distributed computing Computer Systems Centralized Systems Distributed Systems(mainframes, SMPs, workstations) Client - server Peer-to-Peer P2P FILE SHARING APPLICATIONS Centralized Decentralized Structured Unstructured 7
  • 8. P2P Applications Napster: Example• File sharing (music, movies, …) – utilise the idle disk space for storage and the existing m5 network bandwidth for search and download. m6 E – The cost of operation is very low F m1 A D • majority of peers collect only objects that they are E? E m2 B m4 interested in anyway. m3 m4 C D E? m5 E – Eg: Napster, KaZaA and Gnutella m5 m6 F C A m3 m1 m2 File Sharing Services Unstructured P2P• Publish – insert a new file into the network Flooded to connected peers Flooded between supernodes• Lookup – given a file name X, find the host that stores the file• Retrieval – get a copy of the file search transfer supernode• Join – join the network 2.query• Leave – leave the network – Neighbors 1.query peer node Centralized P2P File Sharing: Gnutella• Utilize a central directory for object location • Gnutella is a file sharing protocol• For file-sharing P2P, location inquiry Centralized Server form central servers then downloaded directly from peers • Gnutella was originally designed by Nullsoft, a• Benefits – Simplicity subsidiary of America Online.• – Limited bandwidth usage Drawbacks 1. query • Its architecture is completely decentralised and – Unreliable (single point of failure), performance bottleneck, and upload indexes distributed scalability limits – Vulnerable to DoS attacks 2. response • When a client wishes to connect to the network – Copyright infringement they run through a list of nodes that are most likely to be up or take a list from a website and then 3. transfer connect to how ever many nodes they want 8
  • 9. Gnutella Search Mechanism Peer-to-Peer File Sharing is all about the trading of copyrighted music and videos without paying anything to the authors  Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A,B,C,D,E,F are resources  TTL m5 query E music m6 category F D E E? E? m4 KaZaA E? Native Windows E? Application C A banner B m3 m1 ad m2 3 million users online sharing 4 PetaBytes of data• Advantages – Fast lookup – Low join and leave overhead – Popular files are replicated many times, so lookup with small TTL will usually find the file • Can choose to retrieve from a number of sources Searching• Disadvantages – Not 100% success rate, since TTL is limited – Very high communication overhead – Uneven load distribution Kazaa Search in Unstructured P2P Two general types of search in unstructured p2p: Blind: try to propagate the query to a sufficient number of nodes (example Gnutella) Informed: utilize information about document locations Sharman Networks Kazaa is a file sharing program that allow you to download audio,video, images, documents and software files. 9
  • 10. Blind Search Methods APS – an example BFS and Random Walk Node J holds the requested object Nodes deploy 2 walkers, initially All index values are 20 TTL=3• BFS Random walks •In unstructured networks, flooding would exhaust bandwidth of network. Collaborative Community Informed search • Rapidly changing work environment – Out-sourcing, in-sourcing, home-sourcing – Tight integration and team work with customers,  Informed: utilize information about document partners, vendors locations. • P2P allows management of documents at level of closed working groups.  APS • The collaboration software is designed to improve the productivity of individuals with common goals or interests. • Groove is a collaborative P2P system (http://www.groove.net) – Part of the Microsoft Office system – Document sharing and collaboration – • vital for a business. – Office Groove 2007 is a collaboration software program • helps teams work together dynamically and effectively, even if team members work for different organizations, or work remotely. Work Together: Anyone, Anytime, Anyplace Microsoft Office Groove 2007 Adaptive Probabilistic Search• Each node keeps a local index Example (indices at node A) consisting of one entry for each object it has requested per neighbor. A chooses B with Pr=0.3• Index values represent the A chooses C with Pr =0.5 probability of finding that object A chooses D with Pr=0.2 through that neighbor• Searching is based on the simultaneous deployment of k walkers and probabilistic forwarding.• if a hit occurs, the walker terminates successfully.• On a miss, the query is forwarded to one of the node‟s neighbors. 10
  • 11. Distributed Computing: SETI@home Search for Extraterrestrial Intelligence -if we are alone in the universe or whether there is intelligent life somewhere else in the Universe. Over two million computers crunching away and downloading data gathered from the Arecibo radio telescope in Puerto Rico, USA The SETI@Home project is widely regarded as the fastest computer in the world Sharing of resources such as computation power, network bandwidth and storage Achieves computing power cheaper than a supercomputer can provide. Developed by the Space Sciences Laboratory, at the University of California, Berkeley, in the United States.http://setiathome.ssl.berkeley.edu Launched in 1996 How SETI@home works?  Collect data source  Use telescope to collect data source from outer space at Arecibo.  The SETI@home use data recorder to record data source on removable tape.  Distribution of data source  SETI@home divide data into fixed-size work units.  SETI@home distribute these data via Internet from the servers to a client program.  Client program computes result ,then returns it to the server, and gets another work unit. How SETI@home works? …• Scientific experiment - uses Internet-connected computers• Distributes a screen saver–based application to users• Applies signal analysis algorithms different data sets to process radio-telescope data.• Has more than 3 million users 3. SETI client gets data from server and runs Main Server 4. Client sends results back to server Radio-telescope Data 2. SETI client (screen Saver) starts 11
  • 12. Super nodes• “… a free program that uses the latest P2P…technology to • Super nodes are Skip clients run by users that have a bring affordable and high quality voice communications to people “good” Internet connection and a “good” computer. all over the world…” • Having a good Internet connection means having a public• Skype offers voice, video, chat and data transfer IP address, without firewall restrictions. services over IP • A good computer is a machine that can forward other• The first stable version of Skype has been released in July users‟ communications and handle many connections. 2004, since then the number of users kept on growing. • SN have a role of relay in the network• Nowadays Skype claims having more than 20 millions – Hence, they need a better connectivity and better performances. accounts and between 4 and 6 millions of users • 1 SN are used to connect SC together. simultaneously connected. Skype Skype Software features Skype – login• VoIP from computer to computer – The most used feature especially. • Skype clients directly connect to login• VoIP from computer to regular phone (Skype Out) – By registering on Skype‟s website it is possible buy credit and then call all over servers, whose IP addresses are hard the world with very interesting rates compared to rates applied by phone companies. coded within the software.• Video conferencing Introduced in Skype2.0 in 2006. – In this connection the login name and• Instant Messaging This feature is comparable to many other the version are sent in clear text format. instant messaging clients like MSN Messenger, Yahoo! Messenger, Google Talk, etc. – The main difference is that Skype does not tell the user whether the person he • The login server stores all of user is chatting with is typing or not. This is due to the P2P design of the Skype network. names and passwords and ensures• File Transfer that names are unique across the – The Skype network design has a big influence on the quality of file transfers. It can make it very fast (1Mbps) or very slow (3 kbps). Skype name space Internet Telephony - Skype• The participants form a self-organizing • Connection to a bootstrap node P2P overlay network to locate and – When SC (Skype Client) is installed the first time it communicate with other participants. come with a list of SN to connect to.• The bandwidth is shared and the sound or video in real time is shared as resource – First, the Skype Client tries to connect to 5 SN sending• Skype has a similar architecture as its a UDP packet to IP addresses of super nodes predecessor KaZaA randomly chosen in the host cache.• There are three types of nodes in the – When the client finds a super node to connect to, it Skype network: refreshes its list of active and available super nodes in – Ordinary-peers host cache. – Super-nodes – Central login server – SC connects to a SN• Communications are encrypted (RSA) 12
  • 13. Traffic volume content type (Germany, BitTorent) Skype - user search • Similar to KaZa (searching for callee) • Client sends an user name to SN and as an answer receives few IP addresses and port numbers • Subsequently the client contacts these nodes • If it cannot find the user it sends request to its SN once again and as a result receives another few IP addresses and port numbers • The process continues until the user is found What is PPLive? Skype - call establishment What is PPLive? – An online video broadcasting and advertising • Routing in the Skype overlay network is done by network • Provides an online viewing experience the SN. comparable to that of traditional TV broadcasting • 75 million global installed base and 20 • When a SC tries to establish a call, it first ask its million monthly active users • 600+ channels on PPLive with content SN (if it is not a SN itself) where is the callee and ranging from news, music, sports, movies, tries to connect directly to it. games, live video and other interactive services to a global audience – An efficient P2P technique platform and test – If the SC is restricted because of firewall then it will bench connect to the callee using a SN as a relay. History of PPLive: – If both a caller and a callee have public IP addresses, a • Bill’s story – Inventor of PPLive core technology caller sends signaling information over TCP to a callee – Dropped out of post-graduate program to start PPLive P2P VIDEO STREAMING PPLIVE• Streaming video is content sent in compressed form over the Internet and displayed by the viewer in real time.• With streaming video or streaming media, a Web user does not have to wait to download a file to play it - the media is sent in a continuous stream of data and is played as it arrives.• The user needs a player, which is a special program that uncompresses and sends video data to the display and audio data to speakers.• A player can be either an integral part of a browser or downloaded from the software makers Web site.• P2P streaming – P2P TV • PPLive, PPStream, Joost (by Skype founders), … 13
  • 14. Streaming Tree Reconstruction after a Peer Industry Trends DeparturePPLive is well positioned to exploit the next explosive growth Advanced Video Streaming PPLive Applications VOIP Skype Downloading BitTorrent File Sharing Basic Napster Applications 2001 2003 2004 2005 PPLive Multi-tree Streaming Media Server (channel management server) - Retrieve list of channels via HTTP Membership Server -Retrieve small list of members nodes of interest via UDP Since all peers are involved in the data distribution, the load is spread among all nodes. Single-tree Streaming A snapshot of a tree-based overlay with 231 nodes• A common approach to P2P streaming is to organize participating peers into a single tree-structured overlay – The content is pushed from the source towards all peers. – This way organizing peers is called single-tree streaming.• In these systems, peers are hierarchically organized in a tree structure where the root is the stream source.• The content is spread as a continuous flow of information from the source down to the tree. 14
  • 15. Overall Architecture Web Server Tracker Bit Torrent •Created by Brahm Cohen in 2001 C A Peer Peer [Seed] B [Leech] Downloader Peer “US” [Leech] What is BitTorrent? Overall Architecture• A peer-to-peer file transfer protocol Tracker• Extremely popular today Web Server• “Pull-based” “swarming” approach• Each file split into smaller pieces• Nodes request desired pieces from neighbors• As opposed to parents pushing data C that they receive A• Pieces not downloaded in sequential Peer order Peer [Seed] B• Encourages contribution by all nodes [Leech] Downloader Peer “US” [Leech] Overall Architecture Overall Architecture Web Server Tracker Web Server Tracker C C A A Peer Peer Peer [Seed] Peer [Seed] B B [Leech] [Leech] Downloader Peer Downloader Peer “US” [Leech] “US” [Leech] 15
  • 16. Overall Architecture BitTorrent Lingo Web Server Tracker Seeder = a peer that provides the complete file. Initial seeder = a peer that provides the initial copy. Leecher Initial seeder One who is downloading C A Peer Peer [Seed] Leecher B [Leech]Downloader Peer Seeder “US” [Leech] Overall Architecture BitTorrent Basics Web Server Tracker • Files are broken into pieces. – Users each download different pieces from the original uploader (seed). – Users exchange the pieces with their peers to obtain the ones they are missing. A C • This process is organized by a centralized server Peer called the Tracker. Peer [Seed] B [Leech]Downloader Peer “US” [Leech] Overall Architecture Critical Elements Web Server Tracker • A web server – stores and serves the .torrent file. – For example: • http://bt.btchina.net Web Server • http://bt.ydy.com/ C A Peer The Lord of Ring.torrent Peer [Seed] B [Leech] Troy.torrentDownloader Peer “US” [Leech] 16
  • 17. BitTorrent Swarm Critical Elements • Swarm • The .torrent file – Set of peers all downloading the same file – Static „metainfo‟ file to contain necessary – Organized as a random mesh information : • Each node knows list of pieces downloaded by neighbors • URL of tracker • Node requests pieces it does not own from neighbors • Piece length – Usually 256 KB Matrix.torrent ------------------------------------------------- • SHA-1 hashes of each piece in file • swarm • IP address of the Tracker – The group of machines that are collectively connected for a particular file. • For example, if you start a BitTorrent client and it tells you that youre connected to 10 peers and 3 seeds, then the swarm consists of you and those 13 other people. How a node enters a swarm Critical Elements for file “popeye.mp4” • A BitTorrent tracker – The tracker maintains information about all BitTorrent • File popeye.mp4.torrent clients utilizing each torrent. hosted at a (well-known) – The tracker identifies the network location of each client webserver either uploading or downloading the P2P file associated with • The .torrent has address of a torrent. tracker for file – It also tracks which fragment(s) of that file each client possesses, to assist in efficient data sharing between clients. • The tracker, which runs on a • i.e. the tracker keeps track of all peers downloading file webserver as well, keeps For example: track of all peers • http://bt.cnxp.com:8080/announce downloading file • http://btfans.3322.org:6969/announce Critical Elements How a node enters a swarm for file “popeye.mp4”• An end user (peer) www.bittorrent.com – Guys who want to use BitTorrent must install • File popeye.mp4.torrent corresponding software or plug-in for web browsers. hosted at a (well-known) 1 – Downloader (leecher) : Peer has only a part ( or none ) of webserver the file. Peer • The .torrent has address of tracker for file – Seeder: Peer has the complete file, and chooses to stay • The tracker, which runs on a in the system to allow other peers to download webserver as well, keeps – BitTorrent clients connect to a tracker when attempting track of all peers to work with torrent files. downloading file • The tracker notifies the client of the P2P file location (that is normally on a different, remote server). 17
  • 18. How a node enters a swarm Three elements necessary to sharing a file for file “popeye.mp4” with BitTorrent www.bittorrent.com • The tracker - coordinates connections among the peers. – Tracker doesnt know anything of the actual contents of a file • File popeye.mp4.torrent – Generally, its considered good manners to continue seeding a file after you hosted at a (well-known) have finished downloading, to help out others. webserver • The web server - stores and serves the .torrent file. 2 • The .torrent has address of • At least one seederPeer – Contains any of the files actual contents. tracker for file – The seeder is almost always an end-users desktop machine (peer), rather Tracker • The tracker, which runs on a than a dedicated server machine. webserver as well, keeps – Seeding is monitored by the Tracker – Seed your file for a long time to prevent peers from being left with track of all peers incomplete files. downloading file • When you finish a download in BitTorrent, and you are only uploading, youre seeding! How a node enters a swarm File sharing for file “popeye.mp4” www.bittorrent.com Large files are broken into pieces of size between • File popeye.mp4.torrent hosted at a (well-known) 64 KB and 1 MB webserverPeer • The .torrent has address of tracker for file 3 Tracker • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 1 2 3 4 5 6 7 8 Swarm BT: publishing a file A trivial example {1,2,3,4,5,6,7,8,9,10} Harry Potter.torrent Bob User Seeder: John Web Server {} {1,2,3} Tracker {1,2,3,5} {} {1,2,3} {1,2,3,4} {1,2,3,4,5} User Downloader: Seeder: Downloader: User Downloader A B C Downloader Joe Fan Bin 18
  • 19. Types of P2P Attacks P2P Technical Challenges • Poisoning: a client can provide content that doesn‟t match the description. • Routing protocols – A client A, can broadcast a message saying it needs file • Network topologies „X‟. A malicious client can send a message back to A • Peer discovery saying it has file X, then send it file Y. • Communication/coordination protocols • Denial of Service attacks that decrease or cease • Quality of service total capable network activity. • Security • Defection attacks which allow a client to participate on the network with a very low upload-to- download ratio. Types of P2P Attacks…. P2P SECURITY • Virus attacks, where a malicious client can add viruses into files shared on the network. • Malware attacks, where the P2P software Security is the condition of being protected contains spyware. against danger or loss. • Filtering attacks, where network operators may attempt to prevent P2P network data from being carried out. P2P Security Attacks On & From• P2P file sharing networks are constantly under • Attacks on P2P systems: attack.• P2P is potentially more vulnerable than client server. – Decentralized – More difficult to manage and control • Attacks from P2P Systems:• Need to understand the security issues for architecting future P2P apps 111 114 19
  • 20. Attacks on P2P sharing File Pollution Two types: Unsuspecting users Alice spread pollution ! • Pollution: file corruption  File Content • Index poisoning  File Index 115 Bob 118 File Pollutionoriginal content polluted content Unsuspecting users spread pollution ! pollution company Yuck File Pollution 116 119 File Pollution INDEX POISONING • Aim of the attacker is to make several peers believe that some popular file is present with the victim. • Attacker sends a location publish pollution message to every crawled peer. server • In this message, the attacker includes victim‟s IP address and port number. pollution • Attacker puts the file hash of a popular company file along with the message. file sharing • Peer B adds this file hash into it along network with the location of the victim. pollution pollution • When a peer C searches for that file, it server server may be told by some poisoned peer that victim has the file. pollution server 117 20
  • 21. Index Poisoning Free Riding • Peers share little or no data in P2P file-sharing systems index 23.123.78.6 title location • Measurement bigparty 123.12.7.98 smallfun 23.123.78.6 – Nearly 70% of Gnutella users share no files 123.12.7.98 heyhey 234.8.89.20 – Nearly 50% of all responses are returned by the top 1% of sharing hosts file sharing • Incentive mechanisms to encourage user network cooperation 234.8.89.20 121 Index Poisoning P2P Worms index 23.123.78.6 title location bigparty 123.12.7.98 Topological Passive 123.12.7.98 smallfun 23.123.78.6 heyhey 234.8.89.20 Scan Worms Worms bighit 111.22.22.22 A computer worm is a self-replicating malware computer program. 234.8.89.20 It uses a computer network to send copies of itself to other nodes 111.22.22.22 It may do so without any user intervention. 122 ROUTING TABLE POISONING TOPOLOGICAL WORM ATTACK• The aim of the attacker is to make the peers add victim as their neighbors• Attacker sends node announcement messages to every crawled peer.• Attacker includes victim‟s IP address and port number in these messages• The peers add victim as their neighbor• Query messages are forwarded to the victim 21
  • 22. TOPOLOGICAL WORM ATTACK Effects • Eating up free disk space • Benjamin opens a Web page, called benjamin.xww.de to display banner ads. – One day morning the Benjamin.xww.de Web site had a message saying: "Domain closed due to massive abuse." PASSIVE P2P WORMS• Vulnerability in the protocol• Wait for the vulnerable targets to contact them• Case 1 – Worm can create infected copies of itself with attractive filenames and place them in the shared folder of the P2P client or will replace the files present in the shared folder with itself How vulnerable is BitTorrent? – e.g. VBS.Gnutella, Benjamin Worm etc.• Case 2 – Answers positively to a proportion of search queries by changing the name of the corrupted file to match the search query – e.g. Gnuman 131 P2P-Worm.Win32.Benjamin.a Pollution Attack • P2P-Worm.Win32.Benjamin.a (Kaspersky Lab) is also known as: Worm.P2P.Benjamin.a (Kaspersky Lab), • 1. The peers W32/Benjamin.worm (McAfee), receive the peer W32.Benjamin.Worm (Symantec), Win32.HLLW.Benjamin (Doctor Web) list from the • This worm uses the Kazaa file exchange P2P network tracker. to spread itself. • Benjamin is written in Borland Delphi and is approximately 216 Kb in size - it is compressed by the AsPack utility. 22
  • 23. Pollution Attack DDOS Attack• 2. One peer • DDOS = Distributed denial of service contacts the • Based on the fact the BitTorrent Tracker has no attacker for a mechanism for validating peers. chunk of the file. • Uses modified client software Pollution Attack DDOS Attack• The attacker sends • 1. The attacker back a false downloads a large chunk. number of torrent• This false chunk files from a web will fail its hash server. and will be discarded. Pollution Attack DDOS Attack• 4. Attacker • 2. The attacker parses requests all chunks the torrent files with a modified BitTorrent from swarm and client and spoofs his IP wastes their address and port upload bandwidth. number with the victims as he announces he is joining the swarm. 23
  • 24. Current Solutions: Pollution DDOS Attack Attacks• 3. As the tracker • Blacklisting receives requests for a – Achieved using software such as Peer Guardian or list of participating moBlock. peers from other – Blocks connections from blacklisted IPs which are clients it sends the downloaded from an online database. victims IP and port number. Solutions – TRUST and REPUTATION DDOS Attack • Most of the solutions proposed to solve the problem of attacks are • 4. The peers then based on building trust (and/or reputation) between attempt to the peers connect to the • Some of the popular approaches are: – DCRS - Bit Torrent victim to try and – EigenTrust download a chunk – XRep of the file. • These approaches do slow down the attack Attack illustration What is Trust? What is reputation? • Trust – a peer‟s belief in another peer‟s capabilities, honesty victim and reliability based on its own experiences. • Reputation – a peer‟s belief in another peer‟s capabilities, Who has the files? honesty and reliability based on recommendations received Tracker from other peers. clients – Reputation can be centralized, computed by a third party or it can Discussion be decentralized, computed independently by each other after forum asking other peers recommendations. Victim has the files! .torrent .torrent .torrent .torrent .torrent .torrent attacker 24
  • 25. What is Trust? …….. An Example Trust Management System• Both Trust and Reputation are used to evaluate a peer‟s (BitTorrent) trustworthiness.• Trust and Reputation increase or decrease with further • Debit-Credit Reputation System experience. • Each client calculates a local trust• Trust and reputation both depend on some context. score for their peers Based on valid pieces uploaded /downloaded• For example: • Tracker combines these individual – Mike trusts John as his doctor, but he doesn‟t trust John as a scores to make a global score mechanic who can fix his car. • In the context of seeing a doctor, John is trustworthy • In the context of fixing a car, John untrustworthy. DCRS… …(cont’d) What is Trust Management ? Local Trust Score Computation • “Trust Management” was first coined by Blaze et. al 1996 Fij=Uij-Dij, Uij – the number of chunks that i uploaded to j, – a coherent framework for the study of security Dij- the number of chunks that i downloaded from j policies, security credentials and trust Using Fij, the local trust score LTij is computed as relationships. -1 if bogus chunk is uploaded by peer j 0 if Fij >t 1 if Fij <= t, where „t‟ is the fairness threshold Reputation Management DCRS… …(cont’d)• Need for trust mechanisms Global Trust Score Computation – To assess trustworthiness of peers and the content • Global Trust Scores are a representation the rest of the • Malicious peers generate unlimited number of inauthentic swarms opinion of a peer. files • At regular interval the tracker receives the local trust – To deter malicious behavior scores of peers in the swarm.• Reputation is an assumption that past behavior is • The tracker chooses „k‟ , where „k‟ is < the number of indicative of future behavior peers in the swarm, random local trust scores for peer j in the swarm.• Use of reputation to build trust • Tracker uses k local trust scores for peer j and sets the average of them as the global trust score for j 25
  • 26. DCRS…(cont’d) • P2P systems already store a huge amount of widely • Global trust managed by the tracker prevents clients varying data collected from different sources. from being dishonest. • If this data, distributed over large number of peers, can • Solve the issue of pollution attacks by ignoring be integrated, untrustworthy peers – This represents a very valuable data repository that, upon – Trust systems are more flexible than blacklisting mining, may give very exciting and useful results. because peers can earn back their trust through good • Peer-to-peer K-means Algorithm behavior. – K-means clustering partitions a collection of data • Prevent DDOS attacks because the victim will earn a low tuples, into K disjoint, exhaustive groups (clusters), trust score and be ignored. where K is a user-specified parameter. Example: Topic-wise document clustering in a P2P document repository • Documents stored in different peers are clustered based on three subjects – movies – baseball – hurricane Other Emerging P2P Applications by exchanging information with other peers. • In a P2P clustering, – some peers may not be present in the network all the time, and may join or leave the network while the clustering is in progress. Distributed Data Mining in P2P Networks Cloud computing• Data mining, the extraction of hidden predictive information from large databases • Cloud computing is a computing paradigm shift where computing is moved away from personal computers or• Most off-the-shelf data mining systems are designed to an individual server to a “cloud” of computers. work as a monolithic centralized application. • Users of the cloud only need to be concerned with the• Distributed data mining (DDM) deals with the problem of computing service being asked for, as the underlying data analysis in environments with distributed data, details of how it‟s achieved are hidden. computing nodes, and users. • Done through pooling all computer resources together• P2P networks are well-suited to distributed data mining and being managed by software rather than a human. (DDM) • Prominent players include Google (AppEngine),• A primary goal of P2P data mining is to achieve the same Microsoft (Azure), Amazon (EC2), Yahoo-Apache (or close) data mining result as a centralization approach, (Hadoop) and Cisco-EMC (Acadia) . without moving any data from its original location.Souptik Datta Kanishka Bhaduri Chris Giannella Ran Wolff, Hillol Kargupta, “Distributed Data Mining in Peer-to-PeerNetworks” 26
  • 27. Cloud Architecture So What’s the Issue? • These super server-warehouses are expected to consume around 300 MegaWatts (MW) of electricity a month. Individuals Corporations Non-Commercial • Existing large data-centres consume anywhere from 20 to 50 MW of electricity (enough to power 40,000 homes). – Hence, the energy consumption of the million-server warehouse raises serious concerns on their environmental sustainability. • The environmental impact of cloud computing has not received the desired attention of the research community Cloud Middle Ware and needs to be addressed.StorageProvisioning OS Provisioning Network Provisioning Service(apps) Provisioning SLA(monitor), Security, Billing, – It is estimated that Google‟s Data Center‟s alone consume over Payment 1.5% of the electricity produced world-wide. – The larger data-centers in the US are estimated to consume 25.6 Resources GwH of electricity per year and produce 17006 tones of Co2 emissions Services Storage Network OS The Peer Enterprises Framework The Scale of the CloudGoogle‟s million-server warehouse (Oregon, USA). Each building is approximatelythe size of 2 football fields. Source IEEE Spectrum, Feb. 2009 PE vs Cloud The Scale of the Cloud Parameter Cloud Computing Peer Enterprises 1. Cost Expensive to provision. Involves creation of Uses already provisioned compute infrastructure. internet-scale data-centres costing hundreds of Hence, no new costs need to be incurred. millions of dollars. Each container 2. Energy Consumption Around 300 MegaWatts per month. Already provisioned and in-use. No additional Cooling houses 2500 energy consumption. Towers servers 3. Environmental Impact Yes. Very High. Yes. But, the PE concept does not place an additional load on the environment. 4. Service Migration Tedious. Since, vendor interoperatability is Easy. Organizations can enter into new contracts undefined as yet. and terminate existing contracts. 5. Degree of Decentralization Fairly centralized control. All applications Based on the decentralized P2P concept. No running in a single data-centre. Centralized centralized elements/control. Organizations are elements for load-balancing, scalability etc. free to negotiate resource sharing contracts as required. 6. Data Lock-In Yes. Once all data resides with a single vendor, Organizations can enter into multiple contracts to any connectivity faults can render the data avoid data lock-in by creating requisite irretrievable. redundancy.Schematic of the million-server warehouse. The largest most complex data-center in 7. Performance Expected to be very performant, since Less performant due to frequent node transience. dedicated compute infrastructure is involved. Performance enhancements and optimizationsthe world. Each container has built-in networking, cooling and storage bundled need to be devised.together. Source: IEEE Spectrum, Feb. 2009 8. Vendor Dependence High. Compute infrastructure from only one None. As many service providers can be used by provider can be used. entering into contracts. 27
  • 28. Problems/challenges for ad hoc networks • Problems are due to – Lack of central entity for organization available – Limited range of wireless communication – Mobility of participants – Battery-operated entities WWW + Mobile Telephony = Mobile Access to Mobile P2P? Information • Transferring data from one mobile phone to another 700 • Mobile phone and network limit the possibilities of mobile P2P 600 Mobile Telephone – Low efficiency (CPU and memory) Users 500 – Low bandwidth – Low Power constraint due to energized by battery 400 Internet Users power 300 – Billing 200 100 Much more challenging as compared to traditional P2P 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 MANET: Mobile Ad hoc Networks Full mobile P2P in 2/2.5GAd Hoc networks are wireless, self-organizing systems that provide functionality without infrastructure support. • In 2/2.5 there are limitations that are impossible to overcome:Ad hoc means that there are no central servers. – Operators do not allow to see mobile phones IP address Content is distributed to several nodes instead of one server – Operators control data trafficMANET- A collection of wireless mobile nodes dynamically forming – Network does not offer any way to sustain active connectiona network without any existing infrastructure and the relative in all situationsposition dictate communication links (dynamically changing). – Voice and data can not be transferred simultaneously 28
  • 29. A solution to 2/2.5 P2P: MMS Computer aided P2P: short distance • MMS could be used as a way of sending data • Within short distance we would not have true mobile from one mobile node to another. P2P: However there are problems: – How to know who has the information you need? – MMS size is limited – MMS costs more than GPRS data • A better solution would be to control fixed network peer remotely A solution to 2/2.5 P2P:MMS Computer aided mobile P2P: remotely • We have to have a server that keeps a • For example over http we could control the fixed network peer record of MSISDN (IMSI) number and by using a program called mobile eMule the data that can be found from that number • Downloader asks the data and the person who is downloaded permits or denies download.Mobile Station International Subscriber Directory Number (MSISDN) is a number used toidentify a mobile phone numberinternationally. IMSI:429 01 1234567890MSISDN = CC + NDC + SN CC 429 NepalCC = Country Code Nepal NDC 01NDC = National Destination Code TelecomSN = Subscriber Number SN 1234567890 A better solution: computer aided P2P eMule • eMule is a free peer-to-peer file • All the major limitations could be overcome if the mobile sharing application for phone would be connected to a computer which has P2P Microsoft Windows. software • We would only need a software to communicate between • The name "eMule" comes from the computer and mobile phone: an animal called "Mule" which – Short distance: Infrared, Bluetooth etc. is somehow similar to a donkey – Remotely: Over HTTP 29
  • 30. 3G • Deliver speeds up to 14.4 Mbit/s on the downlink and 5.8Mbit/s on the uplink. • Consumers will be charged on the quantity of data they transmit, not on how much time they are connected to the network. • With 3G you are constantly online and basically pay for the information you receive. • While third-generation packet based networks will allow users to be online all the time the capability for new applications is huge. eMule – how it works? Threats to mobile P2P • In 3G true mobile P2P is possible due to high bandwidth, efficient mobile phones and• Each file that is shared using eMule is hashed as a simultaneous voice and data capability hash list using the MD4 algorithm. – But will the operators allow P2P software since is would lead to the loss of revenues? – In the 3G network architecture, every data connection of a mobile terminal is routed• The MD4 hash, file size, filename, and search through the operator‟s network. This makes it possible for the operator to fully control attributes are stored on eD2k servers the traffic of mobile terminals.• Users can search for filenames in the servers • For example, a network operator has the power to allow or prevent terminal-to- terminal connections in its network.• Users are presented with the filenames and the unique • P2P protocols demand direct connections between the peers because their key idea identifier consisting of the MD4 hash for the file and is that the peers communicate directly with each other without any central server. the files size that can be added to their downloads. – Lack of terminal-to-terminal connections would make it impossible for true P2P to• The client then asks the servers where the other exist. clients are using that hash. • Data transfer fees are currently quite high - reduces the willingness of users to share data in MP2P networks• The servers return a set of IP that indicate the • Use of MP2P applications may reduce the possibilities of operators to sell their own locations of the clients that share the file. services.• eMule then asks the peers for the file. • Viruses, spy etc. Computer aided mobile P2P: eMule P2P Based Software Engineering • With rapid development of the network technologies, 3. download 4. download software development is becoming more and more 1. login 2. search to peer to phone complicated. • Traditional SE management methods based on C/S structure have not been very competent for large scale software development. • Proposes a SE management method based on P2P – Overcomes the servers‟ bottlenecks existed in C/S• eMule is a working solution – Makes full advantages of computation resources• eMule has a large user base, currently averaging 3 to 5 million Lina Zhao, Yin Zhang, Sanyuan Zhang, and Xiuuzi Ye, “P2P-Based Software Engineering Management” 30
  • 31. Future• Semantic P2P• Cloud Computing• Data Mining• P2P Based Software Engineering• Audio/Video Streaming• Security – autonomic computing• Collaborative learning• Mobile P2P• Emergency First Response 31