Transcript of "Bit torrent protocol seminar by Sanjay R"
The BitTorrent Protocol<br />
Common Scenario<br />Millions want to download the same popular huge files (for free)<br />Softwares<br />Media (the real example!)<br />Client-server model fails<br />Single server fails<br />Can’t afford to deploy enough servers<br />
Peer-to-Peer<br /><ul><li>A model of communication where every node in the network acts alike.
As opposed to the Client-Server model, where one node provides services and other nodes use the services. </li></li></ul><li>Advantages of P2P Computing<br />No central point of failure<br />E.g., the Internet and the Web do not have a central point of failure.<br />Most internet and web services use the client-server model (e.g. HTTP), so a specific service does have a central point of failure.<br />Scalability<br />Since every peer is alike, it is possible to add more peers to the system and scale to larger networks.<br />Disadvantage of P2P Computing<br /><ul><li>Decentralized coordination.
All nodes are not created equal.</li></li></ul><li>BitTorrent<br /><ul><li>Written by Bram Cohen in 2001
Used by many different people and organisations
The more popular a large video, audio or software file, the faster and cheaper it can be transferred with BitTorrent</li></li></ul><li>“Pull-based” “swarming” approach<br />Each file split into smaller pieces<br />Nodes request desired pieces from neighbors<br />As opposed to parents pushing data that they receive<br />Pieces not downloaded in sequential order<br />Encourages contribution by all nodes<br />Peer-to-peer in nature<br />Even if clients join simultaneously (“flash crowd”)<br />BitTorrent protocol is implemented in applications called BitTorrent Clients such as uTorrent, Bit Comet.<br />
BitTorrent Terminology<br />Peers – A node or computer that does not have the complete file<br />Seed or seeder - A computer with a complete copy of a BitTorrent file<br />Swarm- A group of computers simultaneously sending (uploading) or receiving (downloading) the same file <br />.torrent - A pointer file that directs your computer to the file you want to download <br />Tracker- A server that manages the BitTorrent file-transfer process<br />
A *.torrent guides users to owners of a file<br />User obtains *.torrent file. File contains meta info about a target file. <br />1<br />Armed with a list of peers holding pieces of the file, user downloads from many peers<br />4<br />User loads *.torrent file into BitTorrent client, which then looks up the named client<br />2<br />3<br />Tracker coordinates peers.<br />
All peers act as a source<br />A machine with a complete copy (the seed) can distribute incomplete pieces to multiple peers<br />Seed<br />Peers exchange different pieces of the file with one another until they assemble a whole<br />As soon as the user has a piece of the file on his machine, he can become a source of that piece to other peers, helping speed download<br />
The key ingredients of the *.torrent file are the tracker’s address and the unique SHA1 hash <br />All data in a metainfo file is encoded. <br />info: a dictionary that describes the file(s) of the torrent. <br />announce: contains the URL of the “tracker”<br />creation date<br />Comments from the author(optional)<br />created by: (optional)<br />piece length: number of bytes in each piece (integer)<br />pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece<br />
Bit Torrent Download<br /><ul><li> Download and install the BitTorrent client software
Check and configure firewall and/or router for BitTorrent (if applicable)
Let BitTorrent give and receive pieces of the file
Stay connected after the download completes to share your .torrent files with others</li></li></ul><li>Upload and Publish File<br /><ul><li> Download and install the BitTorrent client software
Create a New .torrent file</li></ul>Publish the .torrent file on torrent search Index sites such as PirateBay<br />
Peer-peer transactions:Choosing pieces to request<br />Rarest-first: Look at all pieces at all peers, and request piece that’s owned by fewest peers<br />Increases diversity in the pieces downloaded<br />avoids case where a node and each of its peers have exactly the same pieces; increases throughput<br />Increases likelihood all pieces still available even if original seed leaves before any one node has downloaded entire file<br />
Choosing pieces to request<br />Random First Piece:<br />When peer starts to download, request random piece.<br />So as to assemble first complete piece quickly<br />Then participate in uploads<br />When first complete piece assembled, switch to rarest-first<br />
Why BitTorrent took off<br />Better performance through “pull-based” transfer<br />Slow nodes don’t bog down other nodes<br />Allows uploading from hosts that have downloaded parts of a file<br />
Why BitTorrent took off<br />Practical Reasons (perhaps more important!)<br />Working implementation (Bram Cohen) with simple well-defined interfaces for plugging in new content<br />Many recent competitors got sued / shut down<br />Napster, Kazaa<br />Users use well-known, trusted sources to locate content<br />Avoids the pollution problem, where garbage is passed off as authentic content<br />
Pros and cons of BitTorrent<br />Pros<br />Proficient in utilizing partially downloaded files<br />Discourages “freeloading”<br />By rewarding fastest uploaders<br />Encourages diversity through “rarest-first”<br />Extends lifetime of swarm<br />Works well for “hot content”<br />
Pros and cons of BitTorrent<br />Cons<br />Assumes all interested peers active at same time; performance deteriorates if swarm “cools off”<br />Even worse: no trackers for obscure content<br />
Pros and cons of BitTorrent<br />Dependence on centralized tracker: pro/con?<br /> Single point of failure: New nodes can’t enter swarm if tracker goes down<br />Lack of a search feature<br /> Prevents pollution attacks<br /> Users need to resort to out-of-band search: well known torrent-hosting sites / plain old web-search<br />
“Trackerless” BitTorrent<br />To be more precise, “BitTorrent without a centralized-tracker”<br />E.g.: Azureus<br />Uses a Distributed Hash Table (Kademlia DHT)<br />Tracker run by a normal end-host (not a web-server anymore)<br />The original seeder could itself be the tracker <br />Or have a node in the DHT randomly picked to act as the tracker<br />
Why is (studying) BitTorrent important?<br />BitTorrent consumes significant amount of internet traffic today<br />In 2004, BitTorrent accounted for 35 to 60% of all internet traffic (according to CacheLogic)<br />BT always used for legal software (linuxiso) distribution to<br />
Companies using BitTorrent Technology<br /><ul><li>With help from BitTorrent, Facebook can now push hundreds of megabytes of new code to all servers worldwide in just a minute.
Twitter is calling in the help of BitTorrent to deploy files across its many servers in a more efficient way. The project dubbed ‘Murder’ is based on the Open Source BitTornado BitTorrent client. </li></li></ul><li>Conclusion<br />BitTorrent is a well thought-out protocol that embraces aspects of cooperation and self-optimizing mechanisms.<br />BitTorrent propose solutions for current optimization and scalability problems<br />