Common Scenario Millions want to download the same popular huge files (for free) Softwares Media (the real example!) Client-server model fails Single server fails Can’t afford to deploy enough servers
A model of communication where every node in the network acts alike.
As opposed to the Client-Server model, where one node provides services and other nodes use the services.
Advantages of P2P Computing No central point of failure E.g., the Internet and the Web do not have a central point of failure. Most internet and web services use the client-server model (e.g. HTTP), so a specific service does have a central point of failure. Scalability Since every peer is alike, it is possible to add more peers to the system and scale to larger networks. Disadvantage of P2P Computing
Used by many different people and organisations
The more popular a large video, audio or software file, the faster and cheaper it can be transferred with BitTorrent
“Pull-based” “swarming” approach Each file split into smaller pieces Nodes request desired pieces from neighbors As opposed to parents pushing data that they receive Pieces not downloaded in sequential order Encourages contribution by all nodes Peer-to-peer in nature Even if clients join simultaneously (“flash crowd”) BitTorrent protocol is implemented in applications called BitTorrent Clients such as uTorrent, Bit Comet.
BitTorrent Terminology Peers – A node or computer that does not have the complete file Seed or seeder - A computer with a complete copy of a BitTorrent file Swarm- A group of computers simultaneously sending (uploading) or receiving (downloading) the same file .torrent - A pointer file that directs your computer to the file you want to download Tracker- A server that manages the BitTorrent file-transfer process
A *.torrent guides users to owners of a file User obtains *.torrent file. File contains meta info about a target file. 1 Armed with a list of peers holding pieces of the file, user downloads from many peers 4 User loads *.torrent file into BitTorrent client, which then looks up the named client 2 3 Tracker coordinates peers.
All peers act as a source A machine with a complete copy (the seed) can distribute incomplete pieces to multiple peers Seed Peers exchange different pieces of the file with one another until they assemble a whole As soon as the user has a piece of the file on his machine, he can become a source of that piece to other peers, helping speed download
The key ingredients of the *.torrent file are the tracker’s address and the unique SHA1 hash All data in a metainfo file is encoded. info: a dictionary that describes the file(s) of the torrent. announce: contains the URL of the “tracker” creation date Comments from the author(optional) created by: (optional) piece length: number of bytes in each piece (integer) pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece
Publish the .torrent file on torrent search Index sites such as PirateBay
Peer-peer transactions:Choosing pieces to request Rarest-first: Look at all pieces at all peers, and request piece that’s owned by fewest peers Increases diversity in the pieces downloaded avoids case where a node and each of its peers have exactly the same pieces; increases throughput Increases likelihood all pieces still available even if original seed leaves before any one node has downloaded entire file
Choosing pieces to request Random First Piece: When peer starts to download, request random piece. So as to assemble first complete piece quickly Then participate in uploads When first complete piece assembled, switch to rarest-first
Why BitTorrent took off Better performance through “pull-based” transfer Slow nodes don’t bog down other nodes Allows uploading from hosts that have downloaded parts of a file
Why BitTorrent took off Practical Reasons (perhaps more important!) Working implementation (Bram Cohen) with simple well-defined interfaces for plugging in new content Many recent competitors got sued / shut down Napster, Kazaa Users use well-known, trusted sources to locate content Avoids the pollution problem, where garbage is passed off as authentic content
Pros and cons of BitTorrent Pros Proficient in utilizing partially downloaded files Discourages “freeloading” By rewarding fastest uploaders Encourages diversity through “rarest-first” Extends lifetime of swarm Works well for “hot content”
Pros and cons of BitTorrent Cons Assumes all interested peers active at same time; performance deteriorates if swarm “cools off” Even worse: no trackers for obscure content
Pros and cons of BitTorrent Dependence on centralized tracker: pro/con? Single point of failure: New nodes can’t enter swarm if tracker goes down Lack of a search feature Prevents pollution attacks Users need to resort to out-of-band search: well known torrent-hosting sites / plain old web-search
“Trackerless” BitTorrent To be more precise, “BitTorrent without a centralized-tracker” E.g.: Azureus Uses a Distributed Hash Table (Kademlia DHT) Tracker run by a normal end-host (not a web-server anymore) The original seeder could itself be the tracker Or have a node in the DHT randomly picked to act as the tracker
Why is (studying) BitTorrent important? BitTorrent consumes significant amount of internet traffic today In 2004, BitTorrent accounted for 35 to 60% of all internet traffic (according to CacheLogic) BT always used for legal software (linuxiso) distribution to