Word Doc Download

1,003 views
946 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,003
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Word Doc Download

  1. 1. Andrew Brampton Peer-to-Peer Media Streaming Andrew Brampton Peer-to-Peer Media Streaming B.Sc. Computer Science March 2004 - 1 of 61 -
  2. 2. Andrew Brampton Peer-to-Peer Media Streaming I certify that the material contained in this dissertation is my own work and does not contain significant portions of unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Signed Andrew Brampton Date 19th March 2004 - 2 of 61 -
  3. 3. Andrew Brampton Peer-to-Peer Media Streaming Abstract Peer To Peer networks are quickly becoming a new foundation for future internet applications; however no one has applied a P2P paradigm to that of streaming continuous media. One of the key aspects of the future internet will be multimedia rich environment where video and audio streaming is common place between many different people. These services however have not appeared due to many technical problems. This report researches and designs a new concept for adapting existing P2P techniques and applying them to a streaming context to provide a faster and more reliable transport medium for streaming media. If this system works as expected anyone regardless of bandwidth could stream video to thousands of hosts without loss of performance, all by using the receiving peer’s bandwidth to help transmit the stream. Working document URL: http://www.lancs.ac.uk/ug/brampton/fyp/ Contact Email: a.brampton@lancs.ac.uk - 3 of 61 -
  4. 4. Andrew Brampton Peer-to-Peer Media Streaming Table of Contents Abstract........................................................................................................................3 Table of Contents........................................................................................................4 List of Figures...........................................................................................................7 List of Tables.............................................................................................................7 1 Introduction...............................................................................................................8 1.1 Overview of Streaming........................................................................................8 1.2 Overview of Peer-to-Peer.....................................................................................8 1.3 Project Goals........................................................................................................9 1.4 Why is this system needed?.................................................................................9 1.5 Report Structure...................................................................................................9 2 Background Reading...............................................................................................10 2.1 History of Peer-to-Peer.......................................................................................10 2.1.1 ARPANET and the early Internet...............................................................10 2.1.2 Domain Name System (DNS).....................................................................10 2.1.3 Usenet..........................................................................................................11 2.2 Recent P2P.........................................................................................................11 2.2.1 Napster........................................................................................................11 2.2.2 Gnutella ......................................................................................................12 2.2.3 Fasttrack......................................................................................................13 2.2.4 Gnutella2.....................................................................................................14 2.2.5 FreeNet........................................................................................................14 2.2.6 Distributed.net.............................................................................................15 2.2.7 SkyPe...........................................................................................................15 2.2.8 Bittorrent.....................................................................................................15 2.3 Streaming Technologies.....................................................................................16 2.3.1 Multicast......................................................................................................16 2.3.2 Batch Chaining............................................................................................17 2.3.3 NICE............................................................................................................18 2.3.4 ZIGZAG......................................................................................................18 2.4 Recent Research.................................................................................................18 2.4.1 Pastry...........................................................................................................18 2.4.2 SplitStream..................................................................................................19 2.4.3 Chord...........................................................................................................20 2.5 Summary............................................................................................................20 3 Design.......................................................................................................................21 3.1 Requirements......................................................................................................21 3.1.1 Provide a robust network.............................................................................21 3.1.2 Allow quick re-join after peer failure..........................................................21 3.1.3 Stream data with low control overhead.......................................................21 3.1.4 Move the stream distribution load away from the source...........................21 3.1.5 Be scalable...................................................................................................22 3.1.6 Media agnostic............................................................................................22 3.1.7 Be secure.....................................................................................................22 3.2 Peer-to-Peer Network.........................................................................................22 3.3 Stream Representation........................................................................................23 3.4 Tracker...............................................................................................................24 - 4 of 61 -
  5. 5. Andrew Brampton Peer-to-Peer Media Streaming 3.5 Tracker-less Network.........................................................................................24 3.6 Peer.....................................................................................................................25 3.7 Source Peer.........................................................................................................25 3.8 Peer and Tracker Overview................................................................................26 3.9 Tracker Protocol.................................................................................................27 3.9.1 &peer-id=....................................................................................................27 3.9.2 &peer-ip=....................................................................................................27 3.9.3 &peer-port=.................................................................................................27 3.9.4 /?action=join................................................................................................27 3.9.5 /?action=part................................................................................................28 3.9.6 /?action=list.................................................................................................28 3.9.7 HTTP Headers.............................................................................................28 3.9.8 X-BitStream-PartSize..................................................................................28 3.9.9 X-BitStream-ContentType..........................................................................28 3.9.10 X-BitStream-Title......................................................................................28 3.10 Peer Protocol....................................................................................................28 3.10.1 Packets.......................................................................................................28 3.10.2 Packet Header............................................................................................29 3.10.3 Keep Alive.................................................................................................29 3.10.4 Handshake.................................................................................................29 3.10.5 Announcement..........................................................................................29 3.10.6 Request......................................................................................................30 3.10.7 Data...........................................................................................................30 3.11 Program Design................................................................................................30 3.11.1 PeerClient..................................................................................................31 3.11.2 StreamBufferInterface...............................................................................31 3.11.3 PlaybackInterface......................................................................................32 3.11.4 PeerConnection.........................................................................................32 3.11.5 PeerManager..............................................................................................32 3.11.6 PeerPackets................................................................................................33 3.12 Algorithms........................................................................................................33 3.12.1 Piece Picking Quality of Service...............................................................33 3.12.2 Source Saturation Problem........................................................................34 3.12.3 Pre-emptive Sending.................................................................................34 3.13 Code Testing Strategies....................................................................................34 3.14 System Evaluation Strategies...........................................................................35 3.14.1 Predicted Results.......................................................................................35 3.15 Summary..........................................................................................................36 4 Implementation........................................................................................................37 4.1 Changes..............................................................................................................37 4.1.1 Tracker........................................................................................................37 4.1.2 PeerManager................................................................................................38 4.1.3 Vorbis Ogg Playback Library.....................................................................39 4.1.4 Bitmap Class...............................................................................................40 4.2 Problems Encountered........................................................................................40 4.2.1 StreamBuffer changing without notification...............................................40 4.2.2 Concurrency Issues.....................................................................................41 4.2.3 Self Connecting Peer & Peers Connecting Both Ways...............................41 4.3 Algorithms Used................................................................................................41 4.3.1 FindNextPiece.............................................................................................42 - 5 of 61 -
  6. 6. Andrew Brampton Peer-to-Peer Media Streaming 4.3.2 PeerConnection Thread...............................................................................42 4.4 Summary............................................................................................................43 5 System in Operation................................................................................................44 5.1 Tracker...............................................................................................................44 5.2 PeerSource..........................................................................................................45 5.3 PeerClient...........................................................................................................46 5.4 Summary............................................................................................................47 6 Testing......................................................................................................................48 6.1 Unit Testing........................................................................................................48 6.1.1 Bitmap Class...............................................................................................48 6.1.2 StreamBuffer Class.....................................................................................49 6.2 Integration Testing.............................................................................................49 6.2.1 PeerConnection Class..................................................................................49 6.3 Performance Testing..........................................................................................50 6.4 Summary............................................................................................................51 7 Evaluation................................................................................................................52 7.1 Efficiency...........................................................................................................52 7.2 Overheads...........................................................................................................54 7.3 Summary............................................................................................................55 8 Conclusion................................................................................................................56 8.1 Project Goals......................................................................................................56 8.2 Future Work.......................................................................................................57 8.3 Summary............................................................................................................57 9 References................................................................................................................58 10 Appendix................................................................................................................59 10.1 Bitmap Test Cases............................................................................................59 10.2 StreamBuffer Test Cases..................................................................................59 10.3 PeerConnection Test Cases..............................................................................60 10.4 Project Proposal................................................................................................61 - 6 of 61 -
  7. 7. Andrew Brampton Peer-to-Peer Media Streaming List of Figures Figure 2.1 A simple Napster network..........................................................................12 Figure 2.2 A search on a small Gnutella network........................................................12 Figure 2.3 A simple query via super-nodes on Fasttrack.............................................13 Figure 2.4 A BitTorrent network.................................................................................15 Figure 2.5 Batch Chaining Technique.........................................................................17 Figure 2.6 A NICE tree network..................................................................................17 Figure 3.1 Diagram of Peers, Source Peer and Tracker...............................................23 Figure 3.2 Representation of a stream..........................................................................23 Figure 3.3 A tracker-less network................................................................................24 Figure 3.4 UML Sequence diagram of Peer and Tracker interactions.........................26 Figure 3.5 Packet Diagram...........................................................................................28 Figure 3.6 UML Diagram of different classes within the system................................31 Figure 3.7 UML of different PeerPackets....................................................................33 Figure 4.1 UML Diagram of tracker design.................................................................37 Figure 4.2 UML Sequence diagram on how the tracker works internally...................38 Figure 4.3 UML Sequence diagram of PeerManager connecting to a peer.................38 Figure 4.4 UML Class Diagram of OggPlayback........................................................39 Figure 4.5 UML Class Diagram of bitmap..................................................................40 Figure 4.6 Flowchart of FindNextPiece.......................................................................42 Figure 5.1 Log generated by a tracker.........................................................................44 Figure 5.2 Log generated by a PeerSource..................................................................46 Figure 5.3 Log generated by a PeerClient....................................................................46 Figure 6.1 UML Class Diagram of a PeerConnection.................................................49 Figure 7.1 Graph of Percentage of the stream forwarded by non-source peers...........53 Figure 7.2 Graph of protocol overheads depending on number of connected peers....55 List of Tables Table 3.1 Sequence of events for acquiring a new piece.............................................36 Table 6.1 List of tests carried out on the system..........................................................50 Table 6.2 Summarised results from 11 test cases........................................................50 Table 7.1 Predicted overheads compared to observed overheads................................54 - 7 of 61 -
  8. 8. Andrew Brampton Peer-to-Peer Media Streaming 1 Introduction The aim of this project is to research, create and develop a new method of sending media data in a P2P (peer-to-peer) fashion by applying existing P2P techniques and adapting them to a streaming context. Currently P2P is an extremely popular area, but little research has been carried out into distributing media that changes over time. The majority of P2P usage is for static data, for example images, documents, or pre-recorded videos. These types of media don’t change and thus are more easily sent around a P2P network. This project will investigate current P2P and streaming media research and go on to design and implement a multi source streaming technology. 1.1 Overview of Streaming Media streaming is the concept of sending continuous media over a network in real time which could have been from a data store or created on the fly. A simple analogy to this is that of radio stations which broadcast audio over the air waves. Each moment of audio is broadcasted through the air for fractions of a second. After that time that moment of audio is irretrievable. This is also true with network streaming and even more critical when the media is compressed or encoded in a way that won’t tolerate loss of any kind. Radio and Television broadcasts have been running for many years, however streaming technologies are comparatively new. Factors such as low bandwidth hosts, and high costs have limited streaming over the internet. Technical factors also play a large role in the limited success of streaming. Conventional radio waves are broadcasted from a source and sent out in all directions. However the internet is made up of many single point to point links which makes this concept of broadcast near impossible. To add broadcast functionality to the internet, changes to the physical structure of the internet throughout the world would have to be made such as adopting Multicast; alternatively virtual overlay networks can be constructed. An overlay network is one that logically provides and acts like a normal local network (i.e. allowing connections between hosts, and services such as multicast/broadcast), but the difference being that the network may exist on top of many different physical networks. Implementing this presents many technical problems such as scalability and reliability thus becoming an increasingly difficult task when designed for a very diverse network such as the internet. 1.2 Overview of Peer-to-Peer Peer-to-Peer (P2P) is the technology that allows many networked hosts to connect together on an equal basis to share a given resource. This resource may be a file, processing power, hardware resource such as a printer, but in this case will be a media stream. In recent years P2P has been used to help distribute content around a network but until very recently it has only been used for trivial tasks. Inside a P2P network a virtual network is created which will allow broadcast style messages to be sent. This medium will hopefully be reliable, timely and scaleable for streaming media to be transmitted to many thousands of hosts. - 8 of 61 -
  9. 9. Andrew Brampton Peer-to-Peer Media Streaming 1.3 Project Goals This project aims to investigate current P2P and streaming research topics and highlight any flaws in these systems. It will also integrate previously unrelated topics of P2P and streaming into a single solution. This solution will be developed by improving existing techniques whilst solving any flaws they may have. The developed solution must satisfy a list of requirements which will be derived and discussed in chapter 3. Once a suitable solution has been found, it will be scrutinized under numerous tests to find out its usefulness and tested to demonstrate how much more efficient or effective it is to current streaming solutions. 1.4 Why is this system needed? The need for such a system is important when you look forward to the future of the internet. More and more people are looking to use large scale video conferencing, and companies such as the BBC are looking to offer their entire video archive online1. Both these scenarios are not possible until technology has improved. Once such a technology has been developed, many more unknown uses will be devised by the general public. One such possible use would be enabling anyone on the internet to set up their own radio/TV station with very low bandwidth, and feasibly stream to many 1000s of hosts simultaneously. Regardless of the use of such a system, it is obvious that future research can be built on top of this solution which could, in theory, provide large scale distribution of any kinds of future media. 1.5 Report Structure The report will be split into seven chapters. The report begins after the introduction with the background reading chapter. This will investigate and evaluate past and current research in the fields of P2P and streaming. It will explain how current implementations function and highlight their strengths and weaknesses. By using this new knowledge, Chapter 3 will start by deriving a list of requirements and continue to design a new solution. Implementation details will be the focus of Chapter 4 which will be written once the design has been implemented in code. This chapter discusses any changes or problems encountered during the implementation phase. Chapter 5 will involve testing and which will be split into two distinctive sections. The first will prove the correctness of the implementation with black box testing and similar strategies. The second will discuss and display results from the testing conducted on the system to prove its effectiveness. Evaluate will be the focus of Chapter 6, here will be discussed the test results gained in the second half of the testing section. This chapter will also try and explain why any results were better or worse than those predicted. The final Chapter will be the overall project conclusion discussing how well the project completed its goals and any future research which can continue from this. 1 http://news.bbc.co.uk/1/hi/entertainment/tv_and_radio/3177479.stm - 9 of 61 -
  10. 10. Andrew Brampton Peer-to-Peer Media Streaming 2 Background Reading Peer-to-Peer is the networking concept where each device on a network can share its own resources on an equivalent basis with other devices acting as servers or clients. This network can be a physical one such as Ethernet, or a virtual overlay network such as Gnutella. The concept was originally designed as a way to distribute computing resources across many machines. Now the approach is used to help locate machines on the internet (DNS), or download files from other internet users (Kazaa). This Chapter aims to discover how current streaming and Peer-to-Peer technologies work and learn about future developments in these fields. This chapter will then talk about the pros and cons of current implementations in preparation for a design to be developed that will build on their pros and fix their weaknesses. 2.1 History of Peer-to-Peer In the past few years Peer-to-Peer (P2P) has been a new and actively research topic, however, the concept of P2P is much older and was fundamental to the creation of the ARPANET and the Internet [1]. This concept is one in where each device on a network would be considered a peer and shares its own resources on an equivalent basis with other peers. Every peer has access to any other peer’s resources, and may access them at will. This is the opposite of the client/server model where all peers would use the resources of one dedicated more powerful server. 2.1.1 ARPANET and the early Internet In 1969 the universities UCLA, UCSB and Utah with the Stanford Research Institute formed the ARPANET [2]. This was the first network between different sites with the goal of sharing computing resources of each institution. There was no master/ slave or client/server concept; each machine had equal power on the network. Later client/server applications became more popular such as Telnet and FTP. However, the P2P analogy still existed. The computers running the telnet servers were also the computers that ran the telnet clients. The P2P aspects slowly decreased as the ARPANET became larger and concepts such as security and resource management became more important. The original network was very open with any machine allowed access to any other machine. This caused problems with security and in the late 1980’s firewalls became common place dividing the Internet into many smaller private networks with only a few computing resources exposed at each site. The P2P aspect between sites had mostly disappeared, however a few services still ran distributed but in a slightly more restrictive way. Instead of peers being able to connect to anything, trust networks were implemented where dedicated servers were allowed access to other servers that provided the same resource. Such examples of this were Usenet and DNS. 2.1.2 Domain Name System (DNS) The DNS system[3] is one that maps human readable addresses to machine readable addresses, very similar to how a phone book works mapping the name John Smith to phone number 123-456. The system currently works by having 13 main DNS root servers [4] with many smaller DNS servers underneath them. These smaller servers are usually operated by Internet Service Providers (ISPs) which then provide the DNS service to all their users. When a request is made a user may ask their local - 10 of 61 -
  11. 11. Andrew Brampton Peer-to-Peer Media Streaming DNS server, if the local DNS server doesn’t know the answer it will ask the DNS server above it. The root server may not know the answer, but it might know the server with authority over that domain, and tell the local server to query the authority. This is a primitive form of a peer-to-peer communication with a partially distributed control; however central control is there and can override the mapping for any domain. A great example of central control being used abusively to override domain names appeared recently [5]. 2.1.3 Usenet Usenet implements a decentralized model of control and is considered the grandfather of true peer-to-peer application [1]. Due to the fully decentralized control, no one person can govern what happens through the application’s network. Usenet was originally created to exchange files and messages between computers at the University of North Carolina and Duke University. The idea was that students could post messages, and students at either school could then read and reply to these messages. This task was originally automated by using the UUCP (Unix-to-Unix Copy protocol [6]), but later NNTP (Network News Transport Protocol [7]) was designed to be a dedicated service for such traffic. The way the system works is that NNTP clients can subscribe to certain channels/groups where messages of similar topics would be sent. When a new message is posted to one of these groups the local server will keep a copy. Later, other NNTP servers may make requests to the local server asking if any new messages have been posted, if so, transfer these new postings. These messages will slowly make it around the NNTP network as more servers check for updates. The control mechanisms on the network are interesting. The server a message originated from has permission to delete this message from the network by sending out a recall. Additionally the network allows the creation of new groups by a global election. A new group creation event is sent to a well known control group and is listed there for a certain length of time. During this time any user of the NNTP network can vote for its acceptance. If more positive votes are recorded than negative the group is created. This demonstrates a fully automated democracy. 2.2 Recent P2P Until recent years no major user oriented applications have been made which heavily use P2P ideas but this all changed in 1999. The program called Napster [8] was released starting a new wave of P2P protocols and applications. Since then the Napster idea has been improved leading on towards more advance styles of P2P networks and P2P being introduced to other areas of computing which Napster didn’t originally address. 2.2.1 Napster This was the first popular P2P network in recent years. Unfortunately it wasn’t popular due to its technical abilities or because it addressed an important problem. It was popular because it provided millions of users with free music without the permission of the music owners. This attracted great attention from the media and caused a new consumer base for this kind of application. - 11 of 61 -
  12. 12. Andrew Brampton Peer-to-Peer Media Streaming Napster was a closed Key: system so any information on >> Indicates direction of message DB its inner workings had to be reversed engineered [9]. It Central would use a central server ran Server by Napster which all clients << pe >> D would have to use for all O oo er d df control tasks. When the peer un in n Fo > F joined the network it would > authenticate with the central Peer Peer << server and upload a list of B C Peer Peer shared music on the local machine. Other peers could A >> May I have foo? >> D << Here is foo << then send search requests to the Figure 2.1 A simple Napster network central server which would then return the names of any music that matched and the addresses of the peers sharing that music. The peer would then need to make one more request to the server asking for permission to transfer the file from the peer. If this request is accepted then the true P2P aspect of the system begins. The peer makes a direct TCP connection to the other peer and begins the file transfer in a simple sequential way. This is illustrated in . There were many technical problems encountered with this solution, namely saleability and reliability. All users on the network would have to connect directly to the central server, and this caused problems with bandwidth and with machine power. It was also a difficult challenge indexing a million users’ files and carrying out searches on this huge database. The second problem was due to the single point of failure of the central server. This ultimately was the reason that Napster stopped working in 2001 when legal issues forced the service to be terminated. 2.2.2 Gnutella Peer < > > A Peer g To overcome the B got f o Te ll ot fo > o? ? o? Pe Te o Te central organization in the er Bh ll A < > go t fo ll A > > got foo? Tell A > < got foo? Tell A < as late 1999’s a company > Peer Peer named Nullsoft decided to < < ll A go develop a truly P2P Te t > fo > ? ll A o? oo Pe tf Te application named Gnutella Te er go ? ll A oo B < tf ha < [10]. It would require no go s < got foo? Tell A < Peer > > Peer Peer > central authority of any kind, go > Peer B has > A < tf A oo l yet allow all the users to Te ? Te Start ? oo ll A share and search for files on tf > go < the entire network. Key: > Indicates direction Peer Peer It worked by having of message peers relay all messages sent Figure 2.2 A search on a small Gnutella network to them to all the peers they are connected to. The network messages would have a simple TTL (Time to Live) value to limit the distance a message would travel. To join the network a peer would need to know at least one peer on the network and connect to them. When the peer is connected they can make a query for more peers, and a list of peers will be returned, which in turn can be used to form a more strongly connected network. - 12 of 61 -
  13. 13. Andrew Brampton Peer-to-Peer Media Streaming If a peer searched for a file on the network, it would send a message to all its connected peers, which in turn re-sends this message with a decremented TTL value to their peers, and so on until the TTL is zero. This system was good in the way that you can quickly join a huge network by only knowing one peer, and that it was a completely decentralised network. It however had a few scalability problems of its own. The main problem was the control traffic (the messages allowing a new peer to join, and for searches to be carried out) would become very significant when the number of peers on the network increased. Research carried out by a former employee of Napster, Jordan Ritter [11] explains this problem better. His paper goes on to explain that on a simple network where each peer is connected to 8 other peers, a simple search of 18 bytes would incur 1.8mb of traffic after 5 hops, 13mb after 6 hops, and a huge 91mb after 7. This is an exponential increase and affects the network drastically after a few 1000 users join. This problem can also be increased by the number of queries carried out at any one time on the network. With just 1,600 users online an estimated one query a second would be carried out, with each peer required to handle up to 1MBps. illustrates a typical Gnutella network, and how a query would pass though the network. These problems could be decreased if the network was made smaller or if the messages were restricted to only a few hops, however the network would be disjointed with one side of the network not knowing about the other side. This reveals another problem of Gnutella; content may be available on the network bit it is not reachable by all. The final aspect of this protocol which hasn’t been addressed is the way that peers could bias or exploit the network unfairly giving them more resources than other peers. Since the network is completely decentralised and no clear network standards were enforced, users started to exploit the network [12] to make their searches more important, or for them to take far more than they ever give back to the network. 2.2.3 Fasttrack Super The most popular file Node Peer sharing protocol is Kazaa > Peer A e ll got fo A < go reaching more members online ?T Peer el ?T foo o? t foo? B foo than any other P2P network. t go Tell A Peer ot <g T ell Kazaa as well as a few other Super < got foo? > A Tell A programs such as iMesh, Node Peer Super Grokster and the original B has > Node < go Morpheus all used the Fasttrack Peer t fo o? Pe Te protocol [13]. Their system was er B ha ll A Peer s> very similar to the Gnutella Peer protocol yet the network was Key: Peer A closed and encrypted, hence only > Indicates direction of message Start a small amount of information is known about the protocol. Figure 2.3 A simple query via super-nodes on Fasttrack The difference with Fasttrack was how it would organise its peers into a star style structure as shown in . Instead of having all peers on an equal basis you would have two types of peers, normal and super-nodes. Peers would only connect to a super-node, and then super-nodes would connect among themselves. When a peer carried out a search it would be sent to its super-node which would then relay the - 13 of 61 -
  14. 14. Andrew Brampton Peer-to-Peer Media Streaming query to its connected super-nodes. It would not however relay the query to its nodes, because when the peer first connects the super-node would create a cache of the files stored by the peers thus allowing the super node to answer on the behalf of its peers. This difference would reduce the network traffic drastically. With an estimated 100 peers to each super node the network was able to scale a lot better. With the addition of a super-node model there was a need for a new type of co-ordination between the peers to decide when a new super node was required. Information on how this works isn’t publicly available, but one possible approach is that when a super-node thinks it is too over-crowded it will promote one of its peers to a super-node, and redirect some of the load to the new super-node. Even with these improvements, Fasttrack still had a few problems namely privacy and users abusing the network. This privacy problem hasn’t been discussed but it affects all the protocols mentioned so far. When a user carries out a search every peer on the network can see it. This may be fine, however recent social and legal factors have forced developers to create a system where everything is anonymous. 2.2.4 Gnutella2 The Gnutella2 [14] name is much of a “buzz-word” and hasn’t provided any significant improvements to topic of P2P file sharing. It implements the Fasttrack super-node concept, but calls them hub nodes, and normal peers would be leaf nodes. There is a slight difference in that each peer may connect to more than one hub node to improve reach-ability. The only major improvement is in the routing of search messages. Each hub node will keep a cache of search requests in the form of a QHT (Query Hash Table). This is a table of queries carried out on the network with their results. This hash table is then transferred among the hub nodes allowing for new routing features such as filtering and forwarding. If a hub node knows that sending a search query to a neighbouring hub will return zero results it will filter this request and not send it. If it already knows the result to the query it will send the replies on behalf of its neighbours. The problem of users abusing the network has not been addressed however the Gnutella2 developers believe that when the network is fully operational there will be no need to exploit or abuse the network since it will work quickly and effectively for all users. This kind of social security is poor at best. 2.2.5 FreeNet FreeNet is an interesting protocol in the way that it provides full anonymity of your actions on the network. It is described as; “an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity of both authors and readers.” [15] It is very similar to a Gnutella network, but designed to act very much like a file system. No searching of data can be carried out; instead all data must be referenced directly by its name (in this case a 160bit SHA1 hash of the file). When a file is added to the network, parts of it are sent around the network without the peers knowing what the data is, or who first placed it there. Later when someone requests this file, it will be retrieved from any peers with pieces of the data without the requester knowing which peer is sending them the data. This all works by relaying encrypted messages throughout the network with very little direct peer to peer connections. - 14 of 61 -
  15. 15. Andrew Brampton Peer-to-Peer Media Streaming Whilst this provides very affective anonymity it causes the network to function very slowly, with the requested data being sent via many peers before it reaches its destination. This can also place a strain on the peer’s bandwidth having to re-transmit data passing by. 2.2.6 Distributed.net Until now only P2P networks designed for sharing files have been discussed, but this isn’t the only computing resource that can be used. Distributed.net was founded in 1997 and currently has 44,000 participants in a global P2P network for sharing their computer’s processing power [16]. The goal currently is to crack a RC5 72bit encrypted message which with a normal computer would take an astronomical length of time in the order of millions of years. When the computers are connected in a huge Distributed.net network this length of time is cut down to a more reasonable length. Current estimates are 1220 years before it is achieved, but this is still many orders of magnitude better than millions. Technically the system is very simple. Peers connect to a central server which distributes blocks of data for analysis. The peer then runs analysis against this data and returns the results to the central server. Since the analysis can be very time intensive the peer will only request new data once a day allowing for this system to handle many million concurrent clients. 2.2.7 SkyPe SkyPe is another non-file sharing P2P network that instead is a Voice over IP (VoIP) [17] implementation. This is a system that allows you to speak to other users over the internet similar to speaking over a phone. It uses very few P2P concepts other than a simple Gnutella network and the ability to route an encrypted voice conversation via peers [18]. In this kind of situation clients want the best connection between two users, and this is usually achieved by a direct TCP connection, however SkyPe realises that some users are firewalled and direct connections aren’t always possible, thus allows conversations to be past across the P2P network in the most optimum way. It also boasts the ability for a conversation to go many routes to help improve performance. 2.2.8 Bittorrent The most recent P2P Peer application to become popular and which this project is heavily influenced by. The protocol is very Peer Peer simple, there is no searching or advertising of files on the network, each Bittorrent network is only for a Tracker preset group of files. Therefore everyone on the network is trying to download the same thing. The Peer Peer system is broken into two parts, a Tracker and Peers. For each network there is one tracker whose job it is to keep a list of peers on the network Peer and send this peer list to any peer that may request it. This central Figure 2.4 A BitTorrent network - 15 of 61 -
  16. 16. Andrew Brampton Peer-to-Peer Media Streaming authority makes it very easy to track the users on the network, and stops problems of reach-ability experienced in other P2P networks. This is illustrated in . Now that the peers can find the address of all other peers they can make TCP connections to as many peers as they deem necessary. By default BitTorrent will connect to the majority of know peers because it has been shown that randomly constructed graphs with a large out degree can be very robust and stable for such a style network [19, 20]. There are two types of traffic between connected peers, control traffic and data traffic. The set of files which this network is downloading is split into a predetermined number of pieces, and each peer will keep a record of which pieces it currently has. On each new connection a list of completed pieces are exchanged between the two peers, and now the peer may request a piece it knows its neighbour has. When a new piece is downloaded by a peer they inform all their neighbours to the completion. Eventually the peer will have all the pieces and turn into a seeder which just shares. Since all the control communication between peers is comparatively small this part of the system can scale well. Also since the peer can start sharing the pieces it has as soon as it has downloaded at least one the network can quickly distribute the load among itself. This does allow for an extremely quick download. The problems however are centred on the tracker. The tracker is a single point of failure and a bottleneck. If peers can’t connect to a tracker then they can’t get information about other peers on the network, thus never be able to start the download. There are a few solutions to this problem mostly involved with running more than one tracker and DNS load balancing them. 2.3 Streaming Technologies Streaming is the concept of sending continuous time dependant media over a network. This media can be created on the fly from recording equipment for example, or it could be streamed from a stored medium. The paradigm is most similar to common radio broadcast, however over the internet this concept is confused due to the point to point link infrastructure of the internet. These point to point links make it very hard to conduct any kind of broadcast, and make it expensive to broadcasters to recreate a broadcast concept. There are two main approaches to adding broadcast functionally to the internet. The most suitable would be to change the structure of the internet; this however is not practical in the real world. The second solution would be to create virtual overlay networks which adds the broadcast ability, but at a cost. 2.3.1 Multicast This is the ideal streaming technology for the internet, which was developed in 1989 at Stanford University and is documented in RFC 1112 [21]. Multicast is the concept of sending IP packets to a group of IP addresses which have joined the group. The system can be highly dynamic with hosts joining and leaving constantly. The obvious advantage is the source only needs to send one packet to a predefined group address and all hosts in the group will receive it. This system is possible within a network if the routers are multicast enabled. However deployment of network-layer multicast has not been widely adopted by most ISPs [22] due to commercial reasons, thus only a very small number of internet hosts are multicast enabled. - 16 of 61 -
  17. 17. Andrew Brampton Peer-to-Peer Media Streaming To overcome this problem research has been carried out into application/overlay layer multicast protocols [23] with varying degrees of success. The main problems are generally keeping the protocol overhead small, and maintaining a high level of service. The simplest and most widely implemented and used streaming technology are simple Server/Client models, where a server, or cluster of servers send the stream to many clients using the server’s own bandwidth, this can prove problematic if the bandwidth required for number of clients isn’t available. This may be solved by giving some of the bandwidth load to the clients, allowing them to distribute the stream to fellow peers. 2.3.2 Batch Chaining Batch 1 / Chain 1 This concept was developed to solve the Video on Demand proposals Source Peer Peer Peer where any client can request any stream Batch 2 / Chain 1 at any time. This kind of activity can Peer place huge demands on a video server. Instead the system proposed by a paper Batch 3 / Chain 1 from the University of Central Florida [24] improves the network in two ways. Peer Peer Firstly it batches clients that request the same stream at a similar time Batch 1 / Chain 2 together. The result is that the server Peer Peer Peer Peer Peer sends to one peer in the batch who distributes the stream to its siblings in Figure 2.5 Batch Chaining Technique the batch. The problem with this is the first person to create a batch will have to wait a period of time before the last person has joined, thus causing delay for the first user. A typical period of time would be in orders of 10 minutes. The second improvement is to place adjacent batches next to each other in a chain. Once the earliest batch has finished reading the stream, it passes its cached data on to the next batch in the chain. If there aren’t any adjacent batches the server will create two separate chains. This can be seen in with each batch receiving a delayed segment of the stream from the previous batch. - 17 of 61 -
  18. 18. Andrew Brampton Peer-to-Peer Media Streaming There are however limitations to this approach, mainly involving reliability and trust. If the batch disconnects from the chain then any batches after it in the chain will have their streams disrupted and will need to reconnect to the source Source since this will be the only peer with the correct segment of the stream. Group 1 Group 2 Secondly you will be receiving all your content from another batch; a Peer Peer Peer Peer Peer Peer peer in that batch may be corrupting the system which could be Group 3 Group 4 devastating further down the chain. Peer Peer Peer Peer Peer Peer ……………..…..... 2.3.3 NICE NICE [22] is a single-source Group X media streaming protocol developed Peer Peer Peer at the University of Maryland. It organises the peers into a tree Figure 2.6 A NICE tree network structure rooted at the source server. It was designed to help distribute live continuous media quickly and effectively with low overheads. This solution improves on the chaining technique by organising the peers into a tree instead of a simple chain thus increasing reliability and scalability. It works by creating groups (or batches) of peers, with these groups being connected into a tree hierarchy with one peer being nominated the head of the group making the above connection. The non-heads would connect to other groups below them in the tree. This is illustrated in . There is still a problem with reliability, for example a whole branch would be disconnected if the root group left the tree. This is catered for by a quick recovery control protocol. 2.3.4 ZIGZAG Zigzag [25] developed at the University of Central Florida was heavily influenced by the NICE approach. The difference being that the path of data through the tree has been slightly modified to allow for faster recovery and less control traffic. It would still use the same node degree but increase the link degree. Each peer in a group would be connected to their parent node. These additional connections would only be used for reliability if the main link fails. Additionally these new links could allow the peers to be designed into more than one tree. One purpose for this is to create a different tree for data, and a different tree for control traffic. 2.4 Recent Research Most widely used P2P networks were designed outside of the research community and as such they have problems with security, efficiency but mostly with scalability. This section hopes to discuss many new P2P ideas which have been researched in the past few years but have yet to be adopted. 2.4.1 Pastry Pastry [26] is an extendable peer-to-peer overlay network designed at Microsoft Research and Rice University. The idea was to create a fully decentralised - 18 of 61 -
  19. 19. Andrew Brampton Peer-to-Peer Media Streaming network which would allow a numerous number of different applications to be running on top. The protocol implements a very scalable and efficient routing algorithm to provide application level routing which uses very little bandwidth, and guarantees all peers can be reached within Log2b(N) hops where N is the number of peers in the network and b is a routing parameter commonly set as 4. It works by assigning every peer a random number inside a 128bit range called a key, and then when messages are sent within the network they are sent to a specific key allowing the network to implement their own routing algorithm based on this key. When any peer receives (or sends a new) message it has to choose which peer to forward to. It does this by using an internally kept routing table. Unlike the link-state and distance vector methods the peer only keeps data of peers near it instead of a global overview of the network. This allows for much smaller routing tables and for less routing traffic, however the routing of messages takes slightly more hops, but this is a reasonable trade-off. This concept is based on work by Plaxton et al [27]. When forwarding a message to any Key the peer will look for the numerically closest match in its routing table and forward to that peer via the underlying network (for example IP). Eventually the message will get to a peer numerically closest to the destination and if that peer doesn’t have destination in their routing table then the message is undeliverable. To guarantee that numerically close peers are always listed in each other’s routing tables special conditions have to be taken when a peer joins. This adds a little overhead to the join, however it very quickly and effectively adds the peer to the network for everyone to access. There are however two problems with the routing implementation which can occur in rare conditions. With the correct number of fails it is possible for the network to partition and not reconnect, thus causing two isolated networks. However if just one peer is on both networks the networks will re-join in a short time with low effort. There is the previous un-discussed problem of peers being unreachable when very rare race conditions occur under a series of failures and joins on the network. For example if 3 peers exist with keys 10, 20, 30 and a forth peer wanting to forward a packet to peer 10, however only peer 20 is in its routing table. Now if peer 20 fails the packet could be forwarded to peer 30, however due to the order the peers joined peer 30 also doesn’t know about peer 10, and therefore can’t send the packet anywhere, and thus blacks out peer 10. This problem is solved by increasing the b variable and isn’t considered a problem when the network is of large enough size. One last notable addition to the Pastry protocol is the ability of network locality in the sense that Pastry gives nodes which are geographically close preference over nodes further away. This metric is determined by the application but could be based on IP Hops between hosts, or Latency times. It was discussed by Savage et al [28] that selecting geographically local hosts for routing may only be more effective 30-80% of the time. It was also discussed that triangle inequality won’t hold on the internet, causing the reverse path to be different to the forward path, however Pastry theory assumes triangle inequality holds. 2.4.2 SplitStream SplitStream [29] is an application developed by Microsoft Research to operate on top of Pastry. It is a multicast overlay network used for streaming media by constructing many balanced multicast trees where each peer can be a member of one or more trees. In a normal multicast tree with a node out degree of 2 over 50% of the - 19 of 61 -
  20. 20. Andrew Brampton Peer-to-Peer Media Streaming nodes are leaf nodes which are not contributing anything to the network. In a tree with a 16 node more than 90% are leaf nodes. SplitStream tackles this problem by placing nodes in more than one tree, causing the node to be both a leaf and an interior node. On each tree a different segment of the stream is sent, meaning the stream must be split into a specified number of segments at the server. SplitStream presumes the senders will be using algorithms with Multiple Description Video properties allowing the video quality to drop when a node is not connected to all trees. This is useful when peers have varying amounts of bandwidth and can sacrifice video quality for bandwidth, but for those peers with higher bandwidth available they can subscribe to more trees and receive higher quality. Using MDC does limit what continuous media can be sent across the network and allows the protocol to claim robustness. However when the network is delivering media that can’t recover from loss the system can become very unreliable. The time it takes a peer to (re)join the network is in the order of LogO N where O is the node out degree and N is the number of nodes on the network. SplitStream also suffers from the same problems as NICE [22] such as branches of the multicast tree being severed with large numbers of peers losing part of their stream. 2.4.3 Chord Chord is a decentralized lookup service that stores key/value pairs throughout the network. It works on a very similar approach to Pastry however Chord uses a less efficient routing algorithm with order Log N where N is the number of peers, whereas Pastry uses Log2b(N) where b is the node out degree and N is the number of peers. The routing algorithm works very similar, with each peer being assigned a random id, however instead of the message being forward to the peer with the closest matching key; it is forwarded to the peer with its number closest to a power of 2. For example, if peer one sent a message it would be forwarded to peer 2, 4, 8, 16 or etc depending on which power was nearest to the destination. Chord also adds redundancy to that network, such that the data is stored on at least one node thus meaning more than one peer must disconnect before any data is lost. Pastry however stores all data on only one node with surrounding nodes being able to also store this data but is not guaranteed. What Chord makes up in redundancy it loses in scalability and protocol overheads. In theory it is still true that Chord can scale to millions of hosts; however Pastry can scale more easily due to the more efficient routing algorithm. 2.5 Summary This chapter has successfully explained the history of P2P from the very first ARPANET to the cutting edge research such as SplitStream, and Chord. At each step along this history the pros and cons have been made aware and discussed. Also discussed have been the current advances in streaming technologies, mainly with the focus of P2P Streaming. The next chapter will now build upon these research ideas allowing a new concept to be designed which will hopefully avoid the pitfalls of previous projects. - 20 of 61 -
  21. 21. Andrew Brampton Peer-to-Peer Media Streaming 3 Design This chapter will give a high level view of the key components in the system and then follow on to explain the design of the protocol used between these components. Each component will be explained and any algorithms of technical value will be discussed. The inner workings of the system components will be presented in UML diagrams with discussion of the main classes. The last sections will discuss testing and evaluation strategies for the proposed system. 3.1 Requirements A list of requirements has been drawn up from the research carried out in the Background Reading Chapter [see Chapter 2]. The requirements aim to build on any negative aspects of current systems and to add functionality. Each requirement will be explained with a brief description of how it was derived. 3.1.1 Provide a robust network The network should be fault tolerant and be able to survive peer failures without the receiving nodes suffering from interruptions. Since the system will deal with the timely delivery of continuous media it is a reasonable feature to include to make sure the data gets to the destination on time and that peers leaving or joining the network do not interfere with this timely delivery. In current streaming technologies like Batch Chaining and NICE [see sections 2.3.2 and 2.3.3] peers will not receive data on time if the peer’s upstream experience problems. A strongly connected graph of peers would solve this problem and will be used by this system. 3.1.2 Allow quick re-join after peer failure If a node does fail the peer or any peers affected should be able to rejoin the network with minimal effort and without loss of service. As described in the previous section, robustness is a requirement to ensure timely delivery as such any peer failures or joins should not adversely affect the network. ZIGZAG [see section 2.3.4] demonstrates a P2P network with fast recovery. 3.1.3 Stream data with low control overhead The control traffic for constructing and maintaining the network should be as small as possible. Traffic about the stream should also be kept to a minimal so the peer can use most of their bandwidth to receive the stream data. In networks such as Gnutella [see section 2.2.2] it was demonstrated that as the network grew, the control overheads to maintain the functionality of the network increased exponentially. It is therefore a reasonable requirement to request that the control traffic is low. 3.1.4 Move the stream distribution load away from the source The source of the stream should only require a small amount of upload bandwidth, with most of the forward load being placed on peers within the network. Due to technical and financial reasons, Batch chaining [see section 2.3.2] was developed in an effort to move the distribution load away from the source. This change would greatly benefit the source and other peers on the network; as such any protocol that wishes to be greatly useable should also exhibit this property. - 21 of 61 -
  22. 22. Andrew Brampton Peer-to-Peer Media Streaming 3.1.5 Be scalable All the requirements above should not hinder the scalability of the network and as such should allow large numbers of peers to be on the network. The more peers in the network should not exponentially increase control traffic or source server load, in fact, if possible, no increases should be observed. It was shown with DNS and Usenet [see sections 2.1.2 and 2.1.3] that scalability will aid in the successful adoption of the protocol. It is also noteworthy to mention the limited success of Napster and Gnutella [see sections 2.2.1 and 2.2.2], was partly because of scalability issues once their user bases became large enough, causing both systems to break down. 3.1.6 Media agnostic The protocol designed must be generic enough to stream any kind of continuous media, be it Audio, Video or even ticker style text. This requirement doesn’t have a clear derivation, but instead should be included to add greater flexibility and usefulness of the protocol. 3.1.7 Be secure The network should be secure from tampering of the data and from un- authorised users receiving the stream. However, the scope of this project does not cover security but the protocol should be coded in a way to allow this in the future. A very large and growing area of computing is security and any protocol which doesn’t provide a prevision of security will never be globally accepted. 3.2 Peer-to-Peer Network As described in the Background Reading chapter there are many types of existing P2P network models, and a few P2P Streaming models. This system will implement a strongly connected graph of peers’ style network adapted to the content of streaming by providing a time indexed continuous media. The choice of a this network is so; • That each peer will be able to find many other peers quickly and efficiently thus satisfying requirement 1.2. • The large number of connections between peers will hopefully allow data to quickly spread throughout the network. This will also make the network more robust by providing many sources of the stream, helping to satisfying both requirement 1.1 and 1.4. • There will be low organisational overhead between peers because of the peer location services provided by the tracker, thus saving their bandwidth for the content, and aiding in requirement 1.3. This idea is a hybrid of the BitTorrent idea, however it will need modifying; the major change is instead of having a fixed number of pieces a continually increasing number will be used, with old pieces expiring from the network. BitTorrent also announces to other peers when a piece has been downloaded. This concept will be maintained. - 22 of 61 -
  23. 23. Andrew Brampton Peer-to-Peer Media Streaming Source Peer Peer Peer Tracker Peer Peer Peer Weak Link (low bandwidth) Same Physical Strong Link (high bandwidth ) Location Figure 3.1 Diagram of Peers, Source Peer and Tracker The system will comprise of three main components explained here and depicted in . The figure also shows the flow of data between the components with the thickness of the line representing how well connected the components are. Additionally the strongly connected nodes would be transferring more data than the weakly connected. Tracker - The tracker is the peer coordinator. It will store a list of all peers on the network, and allow connecting peers to quickly find other peers to join. It will not handle any data concerned with the stream. It is simply there for peer discovery. Peer – This is one of the nodes in the network which will download and re- send parts of the stream. Source Peer - This is logically the same as another peer, however this is the source of the stream. Peers on the network will not know that the stream originates from this peer, and there will be no bias because of that. The source peer may also reside on the same machine as the tracker however this is not a requirement. This is shown in figure 3.1 by the box surrounding both source peer and tracker. 3.3 Stream Representation x Represents 512bytes The logical representation of the stream will be a sequential list of numbers 0 1 2 3 4 5 6 7 …n with the stream starting at 0, reaching an Represents n × 512bytes arbitrary maximum and wrapping back to 0. Figure 3.2 Representation of a stream Each integer number represents a fixed size block of bytes from the stream, these blocks will hereafter be named pieces. At any time any single peer may have a small non-continuous set of the total stream, however eventually the peer will have a continuous set of pieces allowing correct playback. It - 23 of 61 -
  24. 24. Andrew Brampton Peer-to-Peer Media Streaming is also advised that the peer cache a number of these pieces for a limited amount of time so that they can be sent to other peers. It may also be advisable for the client to download the stream in sequential order however it is not essential and may improve performance if a set of pieces are downloaded concurrently in advance. 3.4 Tracker A major problem with fully distributed networks is locating peers with the information. This is normally attributed to a high degree of network partitioning or low network reach-ability [see section 2.2.2]. To solve this problem a dedicated server designed for the sole purpose of tracking which peers are listening to the same streams will be designed. In a BitTorrent style network there is a single central server called a tracker which stores a list of peers on the network. The reason for the tracker is to lower control overheads, and to limit the effects of network partitioning, thus fulfilling requirements 1.2 and 1.3. In the network there will only ever be a single source of the stream. It therefore makes sense for that peer to also run the tracker since only one is needed. Enforcing the use of a tracker does hinder the ability of requirement 1.1 and 1.5; however the affects will be acceptable. There are a few slightly different solutions for the tracker protocol. A custom protocol could be written using a stateless protocol; however the choice made here is to go with a connection orientated protocol. The tracker will be designed as a HTTP extension, allowing for dedicated web servers to be written as trackers, or a web application/script tracker to be written. HTTP is great for this task due to it being widely used online, and for the stateless, and infrequent properties of the connections. The only problem with the HTTP protocol is that it adds a little extra traffic overhead; however this is acceptable since it will be used infrequency, and compared to data sizes the overhead is negligible. 3.5 Tracker-less Network The current design 1 2 5 10 11 14 17 will use a central server tracker for the network; however a future extension 19 22 24 26 27 35 38 could use a fully distributed tracker. The reason the current design won’t adopt 41 45 48 51 53 56 59 this approach is so the project can focus on the streaming 59 60 61 63 65 66 69 technology. The benefits of a tracker-less approach are numerous. One major benefit 78 80 85 89 90 95 97 would be that the network is more scaleable and therefore, Figure 3.3 A tracker-less network fewer resources would be needed by the stream’s author. The concept would be built upon a Pastry [see section 2.4.1] network. All the nodes using this protocol would all be connected to this wide P2P tracker network regardless of which stream they are listening. The properties of a Pastry network give each peer a unique 128bit identifier. Additionally each stream would need to be - 24 of 61 -
  25. 25. Andrew Brampton Peer-to-Peer Media Streaming assigned a unique 128bit ID. When connecting to a network a peer would send a message to the peer most closely matching the id of the stream. This peer would be nominated to provide tracker functions for the specific stream. Each hop to the nominated peer from any interested peer would record that this peer is now listening to the stream. The closer the message gets to the peer with the stream id, the larger the list of peers for that stream will be on each host. Each peer would be required to keep a list of 10 peer IDs which gets cycled when new peers join. Later when a peer wants to retrieve a list of peers they can send out requests with increasing time to live fields until they have received as many peers as they need. As shown in two peers, 35 and 60, are designated as the trackers for two different streams. The thickness of the lines around the peers indicates how full their peer tables are, with each peer around the trackers storing a percentage of the list of peers using that stream. The more peers listening to that stream the greater radius of data is generated. Take, for example, peer 2 tries routing data to peer 60 asking for peers. Its message would go via 22, 45 and then 60. Each of those intermediate hops knows a limited amount of knowledge. Peer 2 can query each of these progressively until it has obtained a large enough list. It can be seen that this concept may be impractical for networks where by bad luck the peer with the same stream ID happens to be a low bandwidth user who isn’t able to fulfil all requests. Also it would be trivial for a malicious host to assign themselves the stream id and partially corrupt the peer table. Hopefully both these problems will be less important as the size of the network grows because the number of peers holding the peer table will increase. 3.6 Peer The peer will require the most amount of design. In the BitTorrent concept peers will first need to connect to the tracker to receive a list of peers via normal HTTP communication. The peer can then connect to as many other peers listed as deemed necessary. The peer communication protocol will be reliable connection orientated protocol (such as TCP) and be designed as small as possible to help with requirement 1.3. A stateless protocol (such as UDP) wasn’t chosen due to the lack of reliability and stateless nature. There are two fundamental ways the peer protocol could work. The peer could announce that a piece has arrived, or peers could query other peers for their piece set. Standard P2P file sharing networks work by querying for pieces, however this concept doesn’t work well in a streaming environment. If a query was sent after each new piece was downloaded, another message would be sent in reply confirming or denying if the peer has that piece, therefore requiring twice the bandwidth. Also as soon as a remote peer does have a new piece the peer won’t know until its next query. This provides a problem when timely deliver is a requirement where it would be critical for the peer to get that piece on time. Since an announcement protocol will be used the overhead for that packet must be small since a large number of them will be sent. To also improve performance, announcements will only be sent to peers that haven’t previously announced themselves from having that piece. 3.7 Source Peer This will appear like any other peer, however it will never be required to download pieces since it will be the source of the stream. The stream will be read from a file, recording equipment, or another suitable IO device. It would be beneficial - 25 of 61 -
  26. 26. Andrew Brampton Peer-to-Peer Media Streaming for the source to store the last x number of pieces to help spread them on the network. If the source expires the pieces too quickly, peers may miss that piece and it would never make it onto the network. 3.8 Peer and Tracker Overview This section will discuss how the tracker and peers communicate between themselves. Details will be given in later sections, but an overview is displayed in . Tracker Peer A Peer B Peer n Join Network OK List Peers Address Of Peers Connect To Peer B Connect To Peer n {Unknown Length Of Time} Announce Piece Done Request Piece Transmit Piece {Transmit Time} Announce Piece Done Figure 3.4 UML Sequence diagram of Peer and Tracker interactions The peer first connects to a tracker who manages the stream the peer is interested in. This will result in either an OK or an error. If an error occurs, the peer has no choice but to stop and deal with the error by either prompting the users or dealing with it internally. Following this, a list of peers will be requested from the tracker giving the client a subset of all the users connected to the network. Now that the client knows some peers on the network it can make individual connections to each peer, carrying out a handshake and then becoming connected. Next the client will wait for announcements from other peers. Each announcement will inform the client of newly available pieces on the peers allowing the requesting to begin. The client will download the oldest (smallest numbered) piece and start to pre- cache a few pieces ahead of time. Once a piece has been requested and downloaded the client will announce to all its peers about the completion. This process will continue while the client is playing back the media. The client should keep and request a set of pieces before and after the current playback location. The reason for the advance pieces is for pre-caching in case the stream is lost - 26 of 61 -
  27. 27. Andrew Brampton Peer-to-Peer Media Streaming for a period of time. The reason for the older pieces is so that they can be shared with the network for a given amount of time. 3.9 Tracker Protocol The tracker is responsible for holding the list of active peers, and sending this peer list to interested peers. It is also responsible for holding meta-information about the stream. All information is transferred via normal HTTP protocol [RFC 2616] with a well known URI [RFC 2396] describing the location of the stream. The peer will send standard GET requests to this URI with differing query strings to determine what action the peer is taking. All data transferred in the URL will adhere to URL encoding specifications. An example of a valid URL would be: http://tracker.com/?action=join&peer-id=ABCDEFGHIJKLMNOPQRST&peer-port=4321 This would be requesting to join the network, with peer id A-T, with port 4321 listening for incoming connections. A full list of all the valid query parameters follows; 3.9.1 &peer-id= This query field is required with all HTTP Requests to uniquely identify the client to the tracker. The ID will consist of a random peer selected 20 byte string. This ID will be used to identify the peer in the future and should not be revealed to other peers. The ID will be recorded upon joining the stream but on all other commands it shall be used to confirm the identity of the peer and if the ID is incorrect, the command shall be ignored. 3.9.2 &peer-ip= The IP address the peer believes it is listening on. The IP/Port can not be used as a unique identifier since more than one listener may be on the same IP or behind the same NAT gateway. 3.9.3 &peer-port= The TCP port that the peer is listening on for incoming connections. The IP/Port cannot be used as a unique identifier since more than one listener may be on the same IP or behind the same NAT gateway. 3.9.4 /?action=join This is sent when the peer first wants to connect to the stream. The HTTP Body will contain stream specific data which should be used by the peer to understand the format of the stream data. For example, the header of an ogg stream would be sent so that the peer can pick up the stream from any position. HTTP Headers will also be sent explaining application specific details. X-BitStream-PieceSize and X-BitStream- ContentType are both required. From this point the peer will be listed by the tracker as actively subscribing to the stream allowing other peers to connect to it. - 27 of 61 -
  28. 28. Andrew Brampton Peer-to-Peer Media Streaming 3.9.5 /?action=part This is sent when the peer decides to stop listening to the stream. The tracker should remove the peer from the list and free any memory about the peer. The peer-id must be included to make sure the correct peer is removed. 3.9.6 /?action=list The peer will periodically request this to gain a list of peers. The peer can also request this list when it needs more peers to connect to. The normal interval for the peer to request this list should be every 5minutes. If the peer does not keep to this interval the tracker should assume the peer has unintentionally disconnected from the stream and should be removed from the list. 3.9.7 HTTP Headers The following are the headers which can be sent upon join. 3.9.8 X-BitStream-PartSize A required field indicating the size in bytes of each piece in the stream. Setting this to a lower value causes more peer control traffic but allows for less delay in the stream. 3.9.9 X-BitStream-ContentType Required content type of the stream data, for example application/ogg [rfc3534], video/mpeg etc. 3.9.10X-BitStream-Title An optional title of the stream. 3.10 Peer Protocol Information between peers consists of control traffic such as requests and announcements, and actual media stream data. The stream is divided up into different fixed size pieces. Each piece has an integer index with the starting index depending on how far into the stream the current source is. When a peer acquires a new piece it should announce to all connected peers. Peers may optimistically delay announcements to save bandwidth. Peers may also batch announcements together to lower overheads. 3.10.1Packets Messages sent between peers will be using a custom protocol via TCP. The TCP connections must Body be able to send data in both directions allowing peers behind NATs and other such firewalls to operate normally. Each packet will have header and then the Packet Header packet body, this is illustrated simply in . The messages TCP are designed in such a way that if a peer doesn’t understand or implement that type of message they may IP skip and receive the next message. This is to ensure maximum backwards compatibility. Figure 3.5 Packet Diagram - 28 of 61 -

×