Word Doc Download
Upcoming SlideShare
Loading in...5
×
 

Word Doc Download

on

  • 1,174 views

 

Statistics

Views

Total Views
1,174
Views on SlideShare
1,174
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Word Doc Download Word Doc Download Document Transcript

    • Andrew Brampton Peer-to-Peer Media Streaming Andrew Brampton Peer-to-Peer Media Streaming B.Sc. Computer Science March 2004 - 1 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming I certify that the material contained in this dissertation is my own work and does not contain significant portions of unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Signed Andrew Brampton Date 19th March 2004 - 2 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Abstract Peer To Peer networks are quickly becoming a new foundation for future internet applications; however no one has applied a P2P paradigm to that of streaming continuous media. One of the key aspects of the future internet will be multimedia rich environment where video and audio streaming is common place between many different people. These services however have not appeared due to many technical problems. This report researches and designs a new concept for adapting existing P2P techniques and applying them to a streaming context to provide a faster and more reliable transport medium for streaming media. If this system works as expected anyone regardless of bandwidth could stream video to thousands of hosts without loss of performance, all by using the receiving peer’s bandwidth to help transmit the stream. Working document URL: http://www.lancs.ac.uk/ug/brampton/fyp/ Contact Email: a.brampton@lancs.ac.uk - 3 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Table of Contents Abstract........................................................................................................................3 Table of Contents........................................................................................................4 List of Figures...........................................................................................................7 List of Tables.............................................................................................................7 1 Introduction...............................................................................................................8 1.1 Overview of Streaming........................................................................................8 1.2 Overview of Peer-to-Peer.....................................................................................8 1.3 Project Goals........................................................................................................9 1.4 Why is this system needed?.................................................................................9 1.5 Report Structure...................................................................................................9 2 Background Reading...............................................................................................10 2.1 History of Peer-to-Peer.......................................................................................10 2.1.1 ARPANET and the early Internet...............................................................10 2.1.2 Domain Name System (DNS).....................................................................10 2.1.3 Usenet..........................................................................................................11 2.2 Recent P2P.........................................................................................................11 2.2.1 Napster........................................................................................................11 2.2.2 Gnutella ......................................................................................................12 2.2.3 Fasttrack......................................................................................................13 2.2.4 Gnutella2.....................................................................................................14 2.2.5 FreeNet........................................................................................................14 2.2.6 Distributed.net.............................................................................................15 2.2.7 SkyPe...........................................................................................................15 2.2.8 Bittorrent.....................................................................................................15 2.3 Streaming Technologies.....................................................................................16 2.3.1 Multicast......................................................................................................16 2.3.2 Batch Chaining............................................................................................17 2.3.3 NICE............................................................................................................18 2.3.4 ZIGZAG......................................................................................................18 2.4 Recent Research.................................................................................................18 2.4.1 Pastry...........................................................................................................18 2.4.2 SplitStream..................................................................................................19 2.4.3 Chord...........................................................................................................20 2.5 Summary............................................................................................................20 3 Design.......................................................................................................................21 3.1 Requirements......................................................................................................21 3.1.1 Provide a robust network.............................................................................21 3.1.2 Allow quick re-join after peer failure..........................................................21 3.1.3 Stream data with low control overhead.......................................................21 3.1.4 Move the stream distribution load away from the source...........................21 3.1.5 Be scalable...................................................................................................22 3.1.6 Media agnostic............................................................................................22 3.1.7 Be secure.....................................................................................................22 3.2 Peer-to-Peer Network.........................................................................................22 3.3 Stream Representation........................................................................................23 3.4 Tracker...............................................................................................................24 - 4 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.5 Tracker-less Network.........................................................................................24 3.6 Peer.....................................................................................................................25 3.7 Source Peer.........................................................................................................25 3.8 Peer and Tracker Overview................................................................................26 3.9 Tracker Protocol.................................................................................................27 3.9.1 &peer-id=....................................................................................................27 3.9.2 &peer-ip=....................................................................................................27 3.9.3 &peer-port=.................................................................................................27 3.9.4 /?action=join................................................................................................27 3.9.5 /?action=part................................................................................................28 3.9.6 /?action=list.................................................................................................28 3.9.7 HTTP Headers.............................................................................................28 3.9.8 X-BitStream-PartSize..................................................................................28 3.9.9 X-BitStream-ContentType..........................................................................28 3.9.10 X-BitStream-Title......................................................................................28 3.10 Peer Protocol....................................................................................................28 3.10.1 Packets.......................................................................................................28 3.10.2 Packet Header............................................................................................29 3.10.3 Keep Alive.................................................................................................29 3.10.4 Handshake.................................................................................................29 3.10.5 Announcement..........................................................................................29 3.10.6 Request......................................................................................................30 3.10.7 Data...........................................................................................................30 3.11 Program Design................................................................................................30 3.11.1 PeerClient..................................................................................................31 3.11.2 StreamBufferInterface...............................................................................31 3.11.3 PlaybackInterface......................................................................................32 3.11.4 PeerConnection.........................................................................................32 3.11.5 PeerManager..............................................................................................32 3.11.6 PeerPackets................................................................................................33 3.12 Algorithms........................................................................................................33 3.12.1 Piece Picking Quality of Service...............................................................33 3.12.2 Source Saturation Problem........................................................................34 3.12.3 Pre-emptive Sending.................................................................................34 3.13 Code Testing Strategies....................................................................................34 3.14 System Evaluation Strategies...........................................................................35 3.14.1 Predicted Results.......................................................................................35 3.15 Summary..........................................................................................................36 4 Implementation........................................................................................................37 4.1 Changes..............................................................................................................37 4.1.1 Tracker........................................................................................................37 4.1.2 PeerManager................................................................................................38 4.1.3 Vorbis Ogg Playback Library.....................................................................39 4.1.4 Bitmap Class...............................................................................................40 4.2 Problems Encountered........................................................................................40 4.2.1 StreamBuffer changing without notification...............................................40 4.2.2 Concurrency Issues.....................................................................................41 4.2.3 Self Connecting Peer & Peers Connecting Both Ways...............................41 4.3 Algorithms Used................................................................................................41 4.3.1 FindNextPiece.............................................................................................42 - 5 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 4.3.2 PeerConnection Thread...............................................................................42 4.4 Summary............................................................................................................43 5 System in Operation................................................................................................44 5.1 Tracker...............................................................................................................44 5.2 PeerSource..........................................................................................................45 5.3 PeerClient...........................................................................................................46 5.4 Summary............................................................................................................47 6 Testing......................................................................................................................48 6.1 Unit Testing........................................................................................................48 6.1.1 Bitmap Class...............................................................................................48 6.1.2 StreamBuffer Class.....................................................................................49 6.2 Integration Testing.............................................................................................49 6.2.1 PeerConnection Class..................................................................................49 6.3 Performance Testing..........................................................................................50 6.4 Summary............................................................................................................51 7 Evaluation................................................................................................................52 7.1 Efficiency...........................................................................................................52 7.2 Overheads...........................................................................................................54 7.3 Summary............................................................................................................55 8 Conclusion................................................................................................................56 8.1 Project Goals......................................................................................................56 8.2 Future Work.......................................................................................................57 8.3 Summary............................................................................................................57 9 References................................................................................................................58 10 Appendix................................................................................................................59 10.1 Bitmap Test Cases............................................................................................59 10.2 StreamBuffer Test Cases..................................................................................59 10.3 PeerConnection Test Cases..............................................................................60 10.4 Project Proposal................................................................................................61 - 6 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming List of Figures Figure 2.1 A simple Napster network..........................................................................12 Figure 2.2 A search on a small Gnutella network........................................................12 Figure 2.3 A simple query via super-nodes on Fasttrack.............................................13 Figure 2.4 A BitTorrent network.................................................................................15 Figure 2.5 Batch Chaining Technique.........................................................................17 Figure 2.6 A NICE tree network..................................................................................17 Figure 3.1 Diagram of Peers, Source Peer and Tracker...............................................23 Figure 3.2 Representation of a stream..........................................................................23 Figure 3.3 A tracker-less network................................................................................24 Figure 3.4 UML Sequence diagram of Peer and Tracker interactions.........................26 Figure 3.5 Packet Diagram...........................................................................................28 Figure 3.6 UML Diagram of different classes within the system................................31 Figure 3.7 UML of different PeerPackets....................................................................33 Figure 4.1 UML Diagram of tracker design.................................................................37 Figure 4.2 UML Sequence diagram on how the tracker works internally...................38 Figure 4.3 UML Sequence diagram of PeerManager connecting to a peer.................38 Figure 4.4 UML Class Diagram of OggPlayback........................................................39 Figure 4.5 UML Class Diagram of bitmap..................................................................40 Figure 4.6 Flowchart of FindNextPiece.......................................................................42 Figure 5.1 Log generated by a tracker.........................................................................44 Figure 5.2 Log generated by a PeerSource..................................................................46 Figure 5.3 Log generated by a PeerClient....................................................................46 Figure 6.1 UML Class Diagram of a PeerConnection.................................................49 Figure 7.1 Graph of Percentage of the stream forwarded by non-source peers...........53 Figure 7.2 Graph of protocol overheads depending on number of connected peers....55 List of Tables Table 3.1 Sequence of events for acquiring a new piece.............................................36 Table 6.1 List of tests carried out on the system..........................................................50 Table 6.2 Summarised results from 11 test cases........................................................50 Table 7.1 Predicted overheads compared to observed overheads................................54 - 7 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 1 Introduction The aim of this project is to research, create and develop a new method of sending media data in a P2P (peer-to-peer) fashion by applying existing P2P techniques and adapting them to a streaming context. Currently P2P is an extremely popular area, but little research has been carried out into distributing media that changes over time. The majority of P2P usage is for static data, for example images, documents, or pre-recorded videos. These types of media don’t change and thus are more easily sent around a P2P network. This project will investigate current P2P and streaming media research and go on to design and implement a multi source streaming technology. 1.1 Overview of Streaming Media streaming is the concept of sending continuous media over a network in real time which could have been from a data store or created on the fly. A simple analogy to this is that of radio stations which broadcast audio over the air waves. Each moment of audio is broadcasted through the air for fractions of a second. After that time that moment of audio is irretrievable. This is also true with network streaming and even more critical when the media is compressed or encoded in a way that won’t tolerate loss of any kind. Radio and Television broadcasts have been running for many years, however streaming technologies are comparatively new. Factors such as low bandwidth hosts, and high costs have limited streaming over the internet. Technical factors also play a large role in the limited success of streaming. Conventional radio waves are broadcasted from a source and sent out in all directions. However the internet is made up of many single point to point links which makes this concept of broadcast near impossible. To add broadcast functionality to the internet, changes to the physical structure of the internet throughout the world would have to be made such as adopting Multicast; alternatively virtual overlay networks can be constructed. An overlay network is one that logically provides and acts like a normal local network (i.e. allowing connections between hosts, and services such as multicast/broadcast), but the difference being that the network may exist on top of many different physical networks. Implementing this presents many technical problems such as scalability and reliability thus becoming an increasingly difficult task when designed for a very diverse network such as the internet. 1.2 Overview of Peer-to-Peer Peer-to-Peer (P2P) is the technology that allows many networked hosts to connect together on an equal basis to share a given resource. This resource may be a file, processing power, hardware resource such as a printer, but in this case will be a media stream. In recent years P2P has been used to help distribute content around a network but until very recently it has only been used for trivial tasks. Inside a P2P network a virtual network is created which will allow broadcast style messages to be sent. This medium will hopefully be reliable, timely and scaleable for streaming media to be transmitted to many thousands of hosts. - 8 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 1.3 Project Goals This project aims to investigate current P2P and streaming research topics and highlight any flaws in these systems. It will also integrate previously unrelated topics of P2P and streaming into a single solution. This solution will be developed by improving existing techniques whilst solving any flaws they may have. The developed solution must satisfy a list of requirements which will be derived and discussed in chapter 3. Once a suitable solution has been found, it will be scrutinized under numerous tests to find out its usefulness and tested to demonstrate how much more efficient or effective it is to current streaming solutions. 1.4 Why is this system needed? The need for such a system is important when you look forward to the future of the internet. More and more people are looking to use large scale video conferencing, and companies such as the BBC are looking to offer their entire video archive online1. Both these scenarios are not possible until technology has improved. Once such a technology has been developed, many more unknown uses will be devised by the general public. One such possible use would be enabling anyone on the internet to set up their own radio/TV station with very low bandwidth, and feasibly stream to many 1000s of hosts simultaneously. Regardless of the use of such a system, it is obvious that future research can be built on top of this solution which could, in theory, provide large scale distribution of any kinds of future media. 1.5 Report Structure The report will be split into seven chapters. The report begins after the introduction with the background reading chapter. This will investigate and evaluate past and current research in the fields of P2P and streaming. It will explain how current implementations function and highlight their strengths and weaknesses. By using this new knowledge, Chapter 3 will start by deriving a list of requirements and continue to design a new solution. Implementation details will be the focus of Chapter 4 which will be written once the design has been implemented in code. This chapter discusses any changes or problems encountered during the implementation phase. Chapter 5 will involve testing and which will be split into two distinctive sections. The first will prove the correctness of the implementation with black box testing and similar strategies. The second will discuss and display results from the testing conducted on the system to prove its effectiveness. Evaluate will be the focus of Chapter 6, here will be discussed the test results gained in the second half of the testing section. This chapter will also try and explain why any results were better or worse than those predicted. The final Chapter will be the overall project conclusion discussing how well the project completed its goals and any future research which can continue from this. 1 http://news.bbc.co.uk/1/hi/entertainment/tv_and_radio/3177479.stm - 9 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 2 Background Reading Peer-to-Peer is the networking concept where each device on a network can share its own resources on an equivalent basis with other devices acting as servers or clients. This network can be a physical one such as Ethernet, or a virtual overlay network such as Gnutella. The concept was originally designed as a way to distribute computing resources across many machines. Now the approach is used to help locate machines on the internet (DNS), or download files from other internet users (Kazaa). This Chapter aims to discover how current streaming and Peer-to-Peer technologies work and learn about future developments in these fields. This chapter will then talk about the pros and cons of current implementations in preparation for a design to be developed that will build on their pros and fix their weaknesses. 2.1 History of Peer-to-Peer In the past few years Peer-to-Peer (P2P) has been a new and actively research topic, however, the concept of P2P is much older and was fundamental to the creation of the ARPANET and the Internet [1]. This concept is one in where each device on a network would be considered a peer and shares its own resources on an equivalent basis with other peers. Every peer has access to any other peer’s resources, and may access them at will. This is the opposite of the client/server model where all peers would use the resources of one dedicated more powerful server. 2.1.1 ARPANET and the early Internet In 1969 the universities UCLA, UCSB and Utah with the Stanford Research Institute formed the ARPANET [2]. This was the first network between different sites with the goal of sharing computing resources of each institution. There was no master/ slave or client/server concept; each machine had equal power on the network. Later client/server applications became more popular such as Telnet and FTP. However, the P2P analogy still existed. The computers running the telnet servers were also the computers that ran the telnet clients. The P2P aspects slowly decreased as the ARPANET became larger and concepts such as security and resource management became more important. The original network was very open with any machine allowed access to any other machine. This caused problems with security and in the late 1980’s firewalls became common place dividing the Internet into many smaller private networks with only a few computing resources exposed at each site. The P2P aspect between sites had mostly disappeared, however a few services still ran distributed but in a slightly more restrictive way. Instead of peers being able to connect to anything, trust networks were implemented where dedicated servers were allowed access to other servers that provided the same resource. Such examples of this were Usenet and DNS. 2.1.2 Domain Name System (DNS) The DNS system[3] is one that maps human readable addresses to machine readable addresses, very similar to how a phone book works mapping the name John Smith to phone number 123-456. The system currently works by having 13 main DNS root servers [4] with many smaller DNS servers underneath them. These smaller servers are usually operated by Internet Service Providers (ISPs) which then provide the DNS service to all their users. When a request is made a user may ask their local - 10 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming DNS server, if the local DNS server doesn’t know the answer it will ask the DNS server above it. The root server may not know the answer, but it might know the server with authority over that domain, and tell the local server to query the authority. This is a primitive form of a peer-to-peer communication with a partially distributed control; however central control is there and can override the mapping for any domain. A great example of central control being used abusively to override domain names appeared recently [5]. 2.1.3 Usenet Usenet implements a decentralized model of control and is considered the grandfather of true peer-to-peer application [1]. Due to the fully decentralized control, no one person can govern what happens through the application’s network. Usenet was originally created to exchange files and messages between computers at the University of North Carolina and Duke University. The idea was that students could post messages, and students at either school could then read and reply to these messages. This task was originally automated by using the UUCP (Unix-to-Unix Copy protocol [6]), but later NNTP (Network News Transport Protocol [7]) was designed to be a dedicated service for such traffic. The way the system works is that NNTP clients can subscribe to certain channels/groups where messages of similar topics would be sent. When a new message is posted to one of these groups the local server will keep a copy. Later, other NNTP servers may make requests to the local server asking if any new messages have been posted, if so, transfer these new postings. These messages will slowly make it around the NNTP network as more servers check for updates. The control mechanisms on the network are interesting. The server a message originated from has permission to delete this message from the network by sending out a recall. Additionally the network allows the creation of new groups by a global election. A new group creation event is sent to a well known control group and is listed there for a certain length of time. During this time any user of the NNTP network can vote for its acceptance. If more positive votes are recorded than negative the group is created. This demonstrates a fully automated democracy. 2.2 Recent P2P Until recent years no major user oriented applications have been made which heavily use P2P ideas but this all changed in 1999. The program called Napster [8] was released starting a new wave of P2P protocols and applications. Since then the Napster idea has been improved leading on towards more advance styles of P2P networks and P2P being introduced to other areas of computing which Napster didn’t originally address. 2.2.1 Napster This was the first popular P2P network in recent years. Unfortunately it wasn’t popular due to its technical abilities or because it addressed an important problem. It was popular because it provided millions of users with free music without the permission of the music owners. This attracted great attention from the media and caused a new consumer base for this kind of application. - 11 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Napster was a closed Key: system so any information on >> Indicates direction of message DB its inner workings had to be reversed engineered [9]. It Central would use a central server ran Server by Napster which all clients << pe >> D would have to use for all O oo er d df control tasks. When the peer un in n Fo > F joined the network it would > authenticate with the central Peer Peer << server and upload a list of B C Peer Peer shared music on the local machine. Other peers could A >> May I have foo? >> D << Here is foo << then send search requests to the Figure 2.1 A simple Napster network central server which would then return the names of any music that matched and the addresses of the peers sharing that music. The peer would then need to make one more request to the server asking for permission to transfer the file from the peer. If this request is accepted then the true P2P aspect of the system begins. The peer makes a direct TCP connection to the other peer and begins the file transfer in a simple sequential way. This is illustrated in . There were many technical problems encountered with this solution, namely saleability and reliability. All users on the network would have to connect directly to the central server, and this caused problems with bandwidth and with machine power. It was also a difficult challenge indexing a million users’ files and carrying out searches on this huge database. The second problem was due to the single point of failure of the central server. This ultimately was the reason that Napster stopped working in 2001 when legal issues forced the service to be terminated. 2.2.2 Gnutella Peer < > > A Peer g To overcome the B got f o Te ll ot fo > o? ? o? Pe Te o Te central organization in the er Bh ll A < > go t fo ll A > > got foo? Tell A > < got foo? Tell A < as late 1999’s a company > Peer Peer named Nullsoft decided to < < ll A go develop a truly P2P Te t > fo > ? ll A o? oo Pe tf Te application named Gnutella Te er go ? ll A oo B < tf ha < [10]. It would require no go s < got foo? Tell A < Peer > > Peer Peer > central authority of any kind, go > Peer B has > A < tf A oo l yet allow all the users to Te ? Te Start ? oo ll A share and search for files on tf > go < the entire network. Key: > Indicates direction Peer Peer It worked by having of message peers relay all messages sent Figure 2.2 A search on a small Gnutella network to them to all the peers they are connected to. The network messages would have a simple TTL (Time to Live) value to limit the distance a message would travel. To join the network a peer would need to know at least one peer on the network and connect to them. When the peer is connected they can make a query for more peers, and a list of peers will be returned, which in turn can be used to form a more strongly connected network. - 12 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming If a peer searched for a file on the network, it would send a message to all its connected peers, which in turn re-sends this message with a decremented TTL value to their peers, and so on until the TTL is zero. This system was good in the way that you can quickly join a huge network by only knowing one peer, and that it was a completely decentralised network. It however had a few scalability problems of its own. The main problem was the control traffic (the messages allowing a new peer to join, and for searches to be carried out) would become very significant when the number of peers on the network increased. Research carried out by a former employee of Napster, Jordan Ritter [11] explains this problem better. His paper goes on to explain that on a simple network where each peer is connected to 8 other peers, a simple search of 18 bytes would incur 1.8mb of traffic after 5 hops, 13mb after 6 hops, and a huge 91mb after 7. This is an exponential increase and affects the network drastically after a few 1000 users join. This problem can also be increased by the number of queries carried out at any one time on the network. With just 1,600 users online an estimated one query a second would be carried out, with each peer required to handle up to 1MBps. illustrates a typical Gnutella network, and how a query would pass though the network. These problems could be decreased if the network was made smaller or if the messages were restricted to only a few hops, however the network would be disjointed with one side of the network not knowing about the other side. This reveals another problem of Gnutella; content may be available on the network bit it is not reachable by all. The final aspect of this protocol which hasn’t been addressed is the way that peers could bias or exploit the network unfairly giving them more resources than other peers. Since the network is completely decentralised and no clear network standards were enforced, users started to exploit the network [12] to make their searches more important, or for them to take far more than they ever give back to the network. 2.2.3 Fasttrack Super The most popular file Node Peer sharing protocol is Kazaa > Peer A e ll got fo A < go reaching more members online ?T Peer el ?T foo o? t foo? B foo than any other P2P network. t go Tell A Peer ot <g T ell Kazaa as well as a few other Super < got foo? > A Tell A programs such as iMesh, Node Peer Super Grokster and the original B has > Node < go Morpheus all used the Fasttrack Peer t fo o? Pe Te protocol [13]. Their system was er B ha ll A Peer s> very similar to the Gnutella Peer protocol yet the network was Key: Peer A closed and encrypted, hence only > Indicates direction of message Start a small amount of information is known about the protocol. Figure 2.3 A simple query via super-nodes on Fasttrack The difference with Fasttrack was how it would organise its peers into a star style structure as shown in . Instead of having all peers on an equal basis you would have two types of peers, normal and super-nodes. Peers would only connect to a super-node, and then super-nodes would connect among themselves. When a peer carried out a search it would be sent to its super-node which would then relay the - 13 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming query to its connected super-nodes. It would not however relay the query to its nodes, because when the peer first connects the super-node would create a cache of the files stored by the peers thus allowing the super node to answer on the behalf of its peers. This difference would reduce the network traffic drastically. With an estimated 100 peers to each super node the network was able to scale a lot better. With the addition of a super-node model there was a need for a new type of co-ordination between the peers to decide when a new super node was required. Information on how this works isn’t publicly available, but one possible approach is that when a super-node thinks it is too over-crowded it will promote one of its peers to a super-node, and redirect some of the load to the new super-node. Even with these improvements, Fasttrack still had a few problems namely privacy and users abusing the network. This privacy problem hasn’t been discussed but it affects all the protocols mentioned so far. When a user carries out a search every peer on the network can see it. This may be fine, however recent social and legal factors have forced developers to create a system where everything is anonymous. 2.2.4 Gnutella2 The Gnutella2 [14] name is much of a “buzz-word” and hasn’t provided any significant improvements to topic of P2P file sharing. It implements the Fasttrack super-node concept, but calls them hub nodes, and normal peers would be leaf nodes. There is a slight difference in that each peer may connect to more than one hub node to improve reach-ability. The only major improvement is in the routing of search messages. Each hub node will keep a cache of search requests in the form of a QHT (Query Hash Table). This is a table of queries carried out on the network with their results. This hash table is then transferred among the hub nodes allowing for new routing features such as filtering and forwarding. If a hub node knows that sending a search query to a neighbouring hub will return zero results it will filter this request and not send it. If it already knows the result to the query it will send the replies on behalf of its neighbours. The problem of users abusing the network has not been addressed however the Gnutella2 developers believe that when the network is fully operational there will be no need to exploit or abuse the network since it will work quickly and effectively for all users. This kind of social security is poor at best. 2.2.5 FreeNet FreeNet is an interesting protocol in the way that it provides full anonymity of your actions on the network. It is described as; “an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity of both authors and readers.” [15] It is very similar to a Gnutella network, but designed to act very much like a file system. No searching of data can be carried out; instead all data must be referenced directly by its name (in this case a 160bit SHA1 hash of the file). When a file is added to the network, parts of it are sent around the network without the peers knowing what the data is, or who first placed it there. Later when someone requests this file, it will be retrieved from any peers with pieces of the data without the requester knowing which peer is sending them the data. This all works by relaying encrypted messages throughout the network with very little direct peer to peer connections. - 14 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Whilst this provides very affective anonymity it causes the network to function very slowly, with the requested data being sent via many peers before it reaches its destination. This can also place a strain on the peer’s bandwidth having to re-transmit data passing by. 2.2.6 Distributed.net Until now only P2P networks designed for sharing files have been discussed, but this isn’t the only computing resource that can be used. Distributed.net was founded in 1997 and currently has 44,000 participants in a global P2P network for sharing their computer’s processing power [16]. The goal currently is to crack a RC5 72bit encrypted message which with a normal computer would take an astronomical length of time in the order of millions of years. When the computers are connected in a huge Distributed.net network this length of time is cut down to a more reasonable length. Current estimates are 1220 years before it is achieved, but this is still many orders of magnitude better than millions. Technically the system is very simple. Peers connect to a central server which distributes blocks of data for analysis. The peer then runs analysis against this data and returns the results to the central server. Since the analysis can be very time intensive the peer will only request new data once a day allowing for this system to handle many million concurrent clients. 2.2.7 SkyPe SkyPe is another non-file sharing P2P network that instead is a Voice over IP (VoIP) [17] implementation. This is a system that allows you to speak to other users over the internet similar to speaking over a phone. It uses very few P2P concepts other than a simple Gnutella network and the ability to route an encrypted voice conversation via peers [18]. In this kind of situation clients want the best connection between two users, and this is usually achieved by a direct TCP connection, however SkyPe realises that some users are firewalled and direct connections aren’t always possible, thus allows conversations to be past across the P2P network in the most optimum way. It also boasts the ability for a conversation to go many routes to help improve performance. 2.2.8 Bittorrent The most recent P2P Peer application to become popular and which this project is heavily influenced by. The protocol is very Peer Peer simple, there is no searching or advertising of files on the network, each Bittorrent network is only for a Tracker preset group of files. Therefore everyone on the network is trying to download the same thing. The Peer Peer system is broken into two parts, a Tracker and Peers. For each network there is one tracker whose job it is to keep a list of peers on the network Peer and send this peer list to any peer that may request it. This central Figure 2.4 A BitTorrent network - 15 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming authority makes it very easy to track the users on the network, and stops problems of reach-ability experienced in other P2P networks. This is illustrated in . Now that the peers can find the address of all other peers they can make TCP connections to as many peers as they deem necessary. By default BitTorrent will connect to the majority of know peers because it has been shown that randomly constructed graphs with a large out degree can be very robust and stable for such a style network [19, 20]. There are two types of traffic between connected peers, control traffic and data traffic. The set of files which this network is downloading is split into a predetermined number of pieces, and each peer will keep a record of which pieces it currently has. On each new connection a list of completed pieces are exchanged between the two peers, and now the peer may request a piece it knows its neighbour has. When a new piece is downloaded by a peer they inform all their neighbours to the completion. Eventually the peer will have all the pieces and turn into a seeder which just shares. Since all the control communication between peers is comparatively small this part of the system can scale well. Also since the peer can start sharing the pieces it has as soon as it has downloaded at least one the network can quickly distribute the load among itself. This does allow for an extremely quick download. The problems however are centred on the tracker. The tracker is a single point of failure and a bottleneck. If peers can’t connect to a tracker then they can’t get information about other peers on the network, thus never be able to start the download. There are a few solutions to this problem mostly involved with running more than one tracker and DNS load balancing them. 2.3 Streaming Technologies Streaming is the concept of sending continuous time dependant media over a network. This media can be created on the fly from recording equipment for example, or it could be streamed from a stored medium. The paradigm is most similar to common radio broadcast, however over the internet this concept is confused due to the point to point link infrastructure of the internet. These point to point links make it very hard to conduct any kind of broadcast, and make it expensive to broadcasters to recreate a broadcast concept. There are two main approaches to adding broadcast functionally to the internet. The most suitable would be to change the structure of the internet; this however is not practical in the real world. The second solution would be to create virtual overlay networks which adds the broadcast ability, but at a cost. 2.3.1 Multicast This is the ideal streaming technology for the internet, which was developed in 1989 at Stanford University and is documented in RFC 1112 [21]. Multicast is the concept of sending IP packets to a group of IP addresses which have joined the group. The system can be highly dynamic with hosts joining and leaving constantly. The obvious advantage is the source only needs to send one packet to a predefined group address and all hosts in the group will receive it. This system is possible within a network if the routers are multicast enabled. However deployment of network-layer multicast has not been widely adopted by most ISPs [22] due to commercial reasons, thus only a very small number of internet hosts are multicast enabled. - 16 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming To overcome this problem research has been carried out into application/overlay layer multicast protocols [23] with varying degrees of success. The main problems are generally keeping the protocol overhead small, and maintaining a high level of service. The simplest and most widely implemented and used streaming technology are simple Server/Client models, where a server, or cluster of servers send the stream to many clients using the server’s own bandwidth, this can prove problematic if the bandwidth required for number of clients isn’t available. This may be solved by giving some of the bandwidth load to the clients, allowing them to distribute the stream to fellow peers. 2.3.2 Batch Chaining Batch 1 / Chain 1 This concept was developed to solve the Video on Demand proposals Source Peer Peer Peer where any client can request any stream Batch 2 / Chain 1 at any time. This kind of activity can Peer place huge demands on a video server. Instead the system proposed by a paper Batch 3 / Chain 1 from the University of Central Florida [24] improves the network in two ways. Peer Peer Firstly it batches clients that request the same stream at a similar time Batch 1 / Chain 2 together. The result is that the server Peer Peer Peer Peer Peer sends to one peer in the batch who distributes the stream to its siblings in Figure 2.5 Batch Chaining Technique the batch. The problem with this is the first person to create a batch will have to wait a period of time before the last person has joined, thus causing delay for the first user. A typical period of time would be in orders of 10 minutes. The second improvement is to place adjacent batches next to each other in a chain. Once the earliest batch has finished reading the stream, it passes its cached data on to the next batch in the chain. If there aren’t any adjacent batches the server will create two separate chains. This can be seen in with each batch receiving a delayed segment of the stream from the previous batch. - 17 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming There are however limitations to this approach, mainly involving reliability and trust. If the batch disconnects from the chain then any batches after it in the chain will have their streams disrupted and will need to reconnect to the source Source since this will be the only peer with the correct segment of the stream. Group 1 Group 2 Secondly you will be receiving all your content from another batch; a Peer Peer Peer Peer Peer Peer peer in that batch may be corrupting the system which could be Group 3 Group 4 devastating further down the chain. Peer Peer Peer Peer Peer Peer ……………..…..... 2.3.3 NICE NICE [22] is a single-source Group X media streaming protocol developed Peer Peer Peer at the University of Maryland. It organises the peers into a tree Figure 2.6 A NICE tree network structure rooted at the source server. It was designed to help distribute live continuous media quickly and effectively with low overheads. This solution improves on the chaining technique by organising the peers into a tree instead of a simple chain thus increasing reliability and scalability. It works by creating groups (or batches) of peers, with these groups being connected into a tree hierarchy with one peer being nominated the head of the group making the above connection. The non-heads would connect to other groups below them in the tree. This is illustrated in . There is still a problem with reliability, for example a whole branch would be disconnected if the root group left the tree. This is catered for by a quick recovery control protocol. 2.3.4 ZIGZAG Zigzag [25] developed at the University of Central Florida was heavily influenced by the NICE approach. The difference being that the path of data through the tree has been slightly modified to allow for faster recovery and less control traffic. It would still use the same node degree but increase the link degree. Each peer in a group would be connected to their parent node. These additional connections would only be used for reliability if the main link fails. Additionally these new links could allow the peers to be designed into more than one tree. One purpose for this is to create a different tree for data, and a different tree for control traffic. 2.4 Recent Research Most widely used P2P networks were designed outside of the research community and as such they have problems with security, efficiency but mostly with scalability. This section hopes to discuss many new P2P ideas which have been researched in the past few years but have yet to be adopted. 2.4.1 Pastry Pastry [26] is an extendable peer-to-peer overlay network designed at Microsoft Research and Rice University. The idea was to create a fully decentralised - 18 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming network which would allow a numerous number of different applications to be running on top. The protocol implements a very scalable and efficient routing algorithm to provide application level routing which uses very little bandwidth, and guarantees all peers can be reached within Log2b(N) hops where N is the number of peers in the network and b is a routing parameter commonly set as 4. It works by assigning every peer a random number inside a 128bit range called a key, and then when messages are sent within the network they are sent to a specific key allowing the network to implement their own routing algorithm based on this key. When any peer receives (or sends a new) message it has to choose which peer to forward to. It does this by using an internally kept routing table. Unlike the link-state and distance vector methods the peer only keeps data of peers near it instead of a global overview of the network. This allows for much smaller routing tables and for less routing traffic, however the routing of messages takes slightly more hops, but this is a reasonable trade-off. This concept is based on work by Plaxton et al [27]. When forwarding a message to any Key the peer will look for the numerically closest match in its routing table and forward to that peer via the underlying network (for example IP). Eventually the message will get to a peer numerically closest to the destination and if that peer doesn’t have destination in their routing table then the message is undeliverable. To guarantee that numerically close peers are always listed in each other’s routing tables special conditions have to be taken when a peer joins. This adds a little overhead to the join, however it very quickly and effectively adds the peer to the network for everyone to access. There are however two problems with the routing implementation which can occur in rare conditions. With the correct number of fails it is possible for the network to partition and not reconnect, thus causing two isolated networks. However if just one peer is on both networks the networks will re-join in a short time with low effort. There is the previous un-discussed problem of peers being unreachable when very rare race conditions occur under a series of failures and joins on the network. For example if 3 peers exist with keys 10, 20, 30 and a forth peer wanting to forward a packet to peer 10, however only peer 20 is in its routing table. Now if peer 20 fails the packet could be forwarded to peer 30, however due to the order the peers joined peer 30 also doesn’t know about peer 10, and therefore can’t send the packet anywhere, and thus blacks out peer 10. This problem is solved by increasing the b variable and isn’t considered a problem when the network is of large enough size. One last notable addition to the Pastry protocol is the ability of network locality in the sense that Pastry gives nodes which are geographically close preference over nodes further away. This metric is determined by the application but could be based on IP Hops between hosts, or Latency times. It was discussed by Savage et al [28] that selecting geographically local hosts for routing may only be more effective 30-80% of the time. It was also discussed that triangle inequality won’t hold on the internet, causing the reverse path to be different to the forward path, however Pastry theory assumes triangle inequality holds. 2.4.2 SplitStream SplitStream [29] is an application developed by Microsoft Research to operate on top of Pastry. It is a multicast overlay network used for streaming media by constructing many balanced multicast trees where each peer can be a member of one or more trees. In a normal multicast tree with a node out degree of 2 over 50% of the - 19 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming nodes are leaf nodes which are not contributing anything to the network. In a tree with a 16 node more than 90% are leaf nodes. SplitStream tackles this problem by placing nodes in more than one tree, causing the node to be both a leaf and an interior node. On each tree a different segment of the stream is sent, meaning the stream must be split into a specified number of segments at the server. SplitStream presumes the senders will be using algorithms with Multiple Description Video properties allowing the video quality to drop when a node is not connected to all trees. This is useful when peers have varying amounts of bandwidth and can sacrifice video quality for bandwidth, but for those peers with higher bandwidth available they can subscribe to more trees and receive higher quality. Using MDC does limit what continuous media can be sent across the network and allows the protocol to claim robustness. However when the network is delivering media that can’t recover from loss the system can become very unreliable. The time it takes a peer to (re)join the network is in the order of LogO N where O is the node out degree and N is the number of nodes on the network. SplitStream also suffers from the same problems as NICE [22] such as branches of the multicast tree being severed with large numbers of peers losing part of their stream. 2.4.3 Chord Chord is a decentralized lookup service that stores key/value pairs throughout the network. It works on a very similar approach to Pastry however Chord uses a less efficient routing algorithm with order Log N where N is the number of peers, whereas Pastry uses Log2b(N) where b is the node out degree and N is the number of peers. The routing algorithm works very similar, with each peer being assigned a random id, however instead of the message being forward to the peer with the closest matching key; it is forwarded to the peer with its number closest to a power of 2. For example, if peer one sent a message it would be forwarded to peer 2, 4, 8, 16 or etc depending on which power was nearest to the destination. Chord also adds redundancy to that network, such that the data is stored on at least one node thus meaning more than one peer must disconnect before any data is lost. Pastry however stores all data on only one node with surrounding nodes being able to also store this data but is not guaranteed. What Chord makes up in redundancy it loses in scalability and protocol overheads. In theory it is still true that Chord can scale to millions of hosts; however Pastry can scale more easily due to the more efficient routing algorithm. 2.5 Summary This chapter has successfully explained the history of P2P from the very first ARPANET to the cutting edge research such as SplitStream, and Chord. At each step along this history the pros and cons have been made aware and discussed. Also discussed have been the current advances in streaming technologies, mainly with the focus of P2P Streaming. The next chapter will now build upon these research ideas allowing a new concept to be designed which will hopefully avoid the pitfalls of previous projects. - 20 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3 Design This chapter will give a high level view of the key components in the system and then follow on to explain the design of the protocol used between these components. Each component will be explained and any algorithms of technical value will be discussed. The inner workings of the system components will be presented in UML diagrams with discussion of the main classes. The last sections will discuss testing and evaluation strategies for the proposed system. 3.1 Requirements A list of requirements has been drawn up from the research carried out in the Background Reading Chapter [see Chapter 2]. The requirements aim to build on any negative aspects of current systems and to add functionality. Each requirement will be explained with a brief description of how it was derived. 3.1.1 Provide a robust network The network should be fault tolerant and be able to survive peer failures without the receiving nodes suffering from interruptions. Since the system will deal with the timely delivery of continuous media it is a reasonable feature to include to make sure the data gets to the destination on time and that peers leaving or joining the network do not interfere with this timely delivery. In current streaming technologies like Batch Chaining and NICE [see sections 2.3.2 and 2.3.3] peers will not receive data on time if the peer’s upstream experience problems. A strongly connected graph of peers would solve this problem and will be used by this system. 3.1.2 Allow quick re-join after peer failure If a node does fail the peer or any peers affected should be able to rejoin the network with minimal effort and without loss of service. As described in the previous section, robustness is a requirement to ensure timely delivery as such any peer failures or joins should not adversely affect the network. ZIGZAG [see section 2.3.4] demonstrates a P2P network with fast recovery. 3.1.3 Stream data with low control overhead The control traffic for constructing and maintaining the network should be as small as possible. Traffic about the stream should also be kept to a minimal so the peer can use most of their bandwidth to receive the stream data. In networks such as Gnutella [see section 2.2.2] it was demonstrated that as the network grew, the control overheads to maintain the functionality of the network increased exponentially. It is therefore a reasonable requirement to request that the control traffic is low. 3.1.4 Move the stream distribution load away from the source The source of the stream should only require a small amount of upload bandwidth, with most of the forward load being placed on peers within the network. Due to technical and financial reasons, Batch chaining [see section 2.3.2] was developed in an effort to move the distribution load away from the source. This change would greatly benefit the source and other peers on the network; as such any protocol that wishes to be greatly useable should also exhibit this property. - 21 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.1.5 Be scalable All the requirements above should not hinder the scalability of the network and as such should allow large numbers of peers to be on the network. The more peers in the network should not exponentially increase control traffic or source server load, in fact, if possible, no increases should be observed. It was shown with DNS and Usenet [see sections 2.1.2 and 2.1.3] that scalability will aid in the successful adoption of the protocol. It is also noteworthy to mention the limited success of Napster and Gnutella [see sections 2.2.1 and 2.2.2], was partly because of scalability issues once their user bases became large enough, causing both systems to break down. 3.1.6 Media agnostic The protocol designed must be generic enough to stream any kind of continuous media, be it Audio, Video or even ticker style text. This requirement doesn’t have a clear derivation, but instead should be included to add greater flexibility and usefulness of the protocol. 3.1.7 Be secure The network should be secure from tampering of the data and from un- authorised users receiving the stream. However, the scope of this project does not cover security but the protocol should be coded in a way to allow this in the future. A very large and growing area of computing is security and any protocol which doesn’t provide a prevision of security will never be globally accepted. 3.2 Peer-to-Peer Network As described in the Background Reading chapter there are many types of existing P2P network models, and a few P2P Streaming models. This system will implement a strongly connected graph of peers’ style network adapted to the content of streaming by providing a time indexed continuous media. The choice of a this network is so; • That each peer will be able to find many other peers quickly and efficiently thus satisfying requirement 1.2. • The large number of connections between peers will hopefully allow data to quickly spread throughout the network. This will also make the network more robust by providing many sources of the stream, helping to satisfying both requirement 1.1 and 1.4. • There will be low organisational overhead between peers because of the peer location services provided by the tracker, thus saving their bandwidth for the content, and aiding in requirement 1.3. This idea is a hybrid of the BitTorrent idea, however it will need modifying; the major change is instead of having a fixed number of pieces a continually increasing number will be used, with old pieces expiring from the network. BitTorrent also announces to other peers when a piece has been downloaded. This concept will be maintained. - 22 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Source Peer Peer Peer Tracker Peer Peer Peer Weak Link (low bandwidth) Same Physical Strong Link (high bandwidth ) Location Figure 3.1 Diagram of Peers, Source Peer and Tracker The system will comprise of three main components explained here and depicted in . The figure also shows the flow of data between the components with the thickness of the line representing how well connected the components are. Additionally the strongly connected nodes would be transferring more data than the weakly connected. Tracker - The tracker is the peer coordinator. It will store a list of all peers on the network, and allow connecting peers to quickly find other peers to join. It will not handle any data concerned with the stream. It is simply there for peer discovery. Peer – This is one of the nodes in the network which will download and re- send parts of the stream. Source Peer - This is logically the same as another peer, however this is the source of the stream. Peers on the network will not know that the stream originates from this peer, and there will be no bias because of that. The source peer may also reside on the same machine as the tracker however this is not a requirement. This is shown in figure 3.1 by the box surrounding both source peer and tracker. 3.3 Stream Representation x Represents 512bytes The logical representation of the stream will be a sequential list of numbers 0 1 2 3 4 5 6 7 …n with the stream starting at 0, reaching an Represents n × 512bytes arbitrary maximum and wrapping back to 0. Figure 3.2 Representation of a stream Each integer number represents a fixed size block of bytes from the stream, these blocks will hereafter be named pieces. At any time any single peer may have a small non-continuous set of the total stream, however eventually the peer will have a continuous set of pieces allowing correct playback. It - 23 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming is also advised that the peer cache a number of these pieces for a limited amount of time so that they can be sent to other peers. It may also be advisable for the client to download the stream in sequential order however it is not essential and may improve performance if a set of pieces are downloaded concurrently in advance. 3.4 Tracker A major problem with fully distributed networks is locating peers with the information. This is normally attributed to a high degree of network partitioning or low network reach-ability [see section 2.2.2]. To solve this problem a dedicated server designed for the sole purpose of tracking which peers are listening to the same streams will be designed. In a BitTorrent style network there is a single central server called a tracker which stores a list of peers on the network. The reason for the tracker is to lower control overheads, and to limit the effects of network partitioning, thus fulfilling requirements 1.2 and 1.3. In the network there will only ever be a single source of the stream. It therefore makes sense for that peer to also run the tracker since only one is needed. Enforcing the use of a tracker does hinder the ability of requirement 1.1 and 1.5; however the affects will be acceptable. There are a few slightly different solutions for the tracker protocol. A custom protocol could be written using a stateless protocol; however the choice made here is to go with a connection orientated protocol. The tracker will be designed as a HTTP extension, allowing for dedicated web servers to be written as trackers, or a web application/script tracker to be written. HTTP is great for this task due to it being widely used online, and for the stateless, and infrequent properties of the connections. The only problem with the HTTP protocol is that it adds a little extra traffic overhead; however this is acceptable since it will be used infrequency, and compared to data sizes the overhead is negligible. 3.5 Tracker-less Network The current design 1 2 5 10 11 14 17 will use a central server tracker for the network; however a future extension 19 22 24 26 27 35 38 could use a fully distributed tracker. The reason the current design won’t adopt 41 45 48 51 53 56 59 this approach is so the project can focus on the streaming 59 60 61 63 65 66 69 technology. The benefits of a tracker-less approach are numerous. One major benefit 78 80 85 89 90 95 97 would be that the network is more scaleable and therefore, Figure 3.3 A tracker-less network fewer resources would be needed by the stream’s author. The concept would be built upon a Pastry [see section 2.4.1] network. All the nodes using this protocol would all be connected to this wide P2P tracker network regardless of which stream they are listening. The properties of a Pastry network give each peer a unique 128bit identifier. Additionally each stream would need to be - 24 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming assigned a unique 128bit ID. When connecting to a network a peer would send a message to the peer most closely matching the id of the stream. This peer would be nominated to provide tracker functions for the specific stream. Each hop to the nominated peer from any interested peer would record that this peer is now listening to the stream. The closer the message gets to the peer with the stream id, the larger the list of peers for that stream will be on each host. Each peer would be required to keep a list of 10 peer IDs which gets cycled when new peers join. Later when a peer wants to retrieve a list of peers they can send out requests with increasing time to live fields until they have received as many peers as they need. As shown in two peers, 35 and 60, are designated as the trackers for two different streams. The thickness of the lines around the peers indicates how full their peer tables are, with each peer around the trackers storing a percentage of the list of peers using that stream. The more peers listening to that stream the greater radius of data is generated. Take, for example, peer 2 tries routing data to peer 60 asking for peers. Its message would go via 22, 45 and then 60. Each of those intermediate hops knows a limited amount of knowledge. Peer 2 can query each of these progressively until it has obtained a large enough list. It can be seen that this concept may be impractical for networks where by bad luck the peer with the same stream ID happens to be a low bandwidth user who isn’t able to fulfil all requests. Also it would be trivial for a malicious host to assign themselves the stream id and partially corrupt the peer table. Hopefully both these problems will be less important as the size of the network grows because the number of peers holding the peer table will increase. 3.6 Peer The peer will require the most amount of design. In the BitTorrent concept peers will first need to connect to the tracker to receive a list of peers via normal HTTP communication. The peer can then connect to as many other peers listed as deemed necessary. The peer communication protocol will be reliable connection orientated protocol (such as TCP) and be designed as small as possible to help with requirement 1.3. A stateless protocol (such as UDP) wasn’t chosen due to the lack of reliability and stateless nature. There are two fundamental ways the peer protocol could work. The peer could announce that a piece has arrived, or peers could query other peers for their piece set. Standard P2P file sharing networks work by querying for pieces, however this concept doesn’t work well in a streaming environment. If a query was sent after each new piece was downloaded, another message would be sent in reply confirming or denying if the peer has that piece, therefore requiring twice the bandwidth. Also as soon as a remote peer does have a new piece the peer won’t know until its next query. This provides a problem when timely deliver is a requirement where it would be critical for the peer to get that piece on time. Since an announcement protocol will be used the overhead for that packet must be small since a large number of them will be sent. To also improve performance, announcements will only be sent to peers that haven’t previously announced themselves from having that piece. 3.7 Source Peer This will appear like any other peer, however it will never be required to download pieces since it will be the source of the stream. The stream will be read from a file, recording equipment, or another suitable IO device. It would be beneficial - 25 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming for the source to store the last x number of pieces to help spread them on the network. If the source expires the pieces too quickly, peers may miss that piece and it would never make it onto the network. 3.8 Peer and Tracker Overview This section will discuss how the tracker and peers communicate between themselves. Details will be given in later sections, but an overview is displayed in . Tracker Peer A Peer B Peer n Join Network OK List Peers Address Of Peers Connect To Peer B Connect To Peer n {Unknown Length Of Time} Announce Piece Done Request Piece Transmit Piece {Transmit Time} Announce Piece Done Figure 3.4 UML Sequence diagram of Peer and Tracker interactions The peer first connects to a tracker who manages the stream the peer is interested in. This will result in either an OK or an error. If an error occurs, the peer has no choice but to stop and deal with the error by either prompting the users or dealing with it internally. Following this, a list of peers will be requested from the tracker giving the client a subset of all the users connected to the network. Now that the client knows some peers on the network it can make individual connections to each peer, carrying out a handshake and then becoming connected. Next the client will wait for announcements from other peers. Each announcement will inform the client of newly available pieces on the peers allowing the requesting to begin. The client will download the oldest (smallest numbered) piece and start to pre- cache a few pieces ahead of time. Once a piece has been requested and downloaded the client will announce to all its peers about the completion. This process will continue while the client is playing back the media. The client should keep and request a set of pieces before and after the current playback location. The reason for the advance pieces is for pre-caching in case the stream is lost - 26 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming for a period of time. The reason for the older pieces is so that they can be shared with the network for a given amount of time. 3.9 Tracker Protocol The tracker is responsible for holding the list of active peers, and sending this peer list to interested peers. It is also responsible for holding meta-information about the stream. All information is transferred via normal HTTP protocol [RFC 2616] with a well known URI [RFC 2396] describing the location of the stream. The peer will send standard GET requests to this URI with differing query strings to determine what action the peer is taking. All data transferred in the URL will adhere to URL encoding specifications. An example of a valid URL would be: http://tracker.com/?action=join&peer-id=ABCDEFGHIJKLMNOPQRST&peer-port=4321 This would be requesting to join the network, with peer id A-T, with port 4321 listening for incoming connections. A full list of all the valid query parameters follows; 3.9.1 &peer-id= This query field is required with all HTTP Requests to uniquely identify the client to the tracker. The ID will consist of a random peer selected 20 byte string. This ID will be used to identify the peer in the future and should not be revealed to other peers. The ID will be recorded upon joining the stream but on all other commands it shall be used to confirm the identity of the peer and if the ID is incorrect, the command shall be ignored. 3.9.2 &peer-ip= The IP address the peer believes it is listening on. The IP/Port can not be used as a unique identifier since more than one listener may be on the same IP or behind the same NAT gateway. 3.9.3 &peer-port= The TCP port that the peer is listening on for incoming connections. The IP/Port cannot be used as a unique identifier since more than one listener may be on the same IP or behind the same NAT gateway. 3.9.4 /?action=join This is sent when the peer first wants to connect to the stream. The HTTP Body will contain stream specific data which should be used by the peer to understand the format of the stream data. For example, the header of an ogg stream would be sent so that the peer can pick up the stream from any position. HTTP Headers will also be sent explaining application specific details. X-BitStream-PieceSize and X-BitStream- ContentType are both required. From this point the peer will be listed by the tracker as actively subscribing to the stream allowing other peers to connect to it. - 27 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.9.5 /?action=part This is sent when the peer decides to stop listening to the stream. The tracker should remove the peer from the list and free any memory about the peer. The peer-id must be included to make sure the correct peer is removed. 3.9.6 /?action=list The peer will periodically request this to gain a list of peers. The peer can also request this list when it needs more peers to connect to. The normal interval for the peer to request this list should be every 5minutes. If the peer does not keep to this interval the tracker should assume the peer has unintentionally disconnected from the stream and should be removed from the list. 3.9.7 HTTP Headers The following are the headers which can be sent upon join. 3.9.8 X-BitStream-PartSize A required field indicating the size in bytes of each piece in the stream. Setting this to a lower value causes more peer control traffic but allows for less delay in the stream. 3.9.9 X-BitStream-ContentType Required content type of the stream data, for example application/ogg [rfc3534], video/mpeg etc. 3.9.10X-BitStream-Title An optional title of the stream. 3.10 Peer Protocol Information between peers consists of control traffic such as requests and announcements, and actual media stream data. The stream is divided up into different fixed size pieces. Each piece has an integer index with the starting index depending on how far into the stream the current source is. When a peer acquires a new piece it should announce to all connected peers. Peers may optimistically delay announcements to save bandwidth. Peers may also batch announcements together to lower overheads. 3.10.1Packets Messages sent between peers will be using a custom protocol via TCP. The TCP connections must Body be able to send data in both directions allowing peers behind NATs and other such firewalls to operate normally. Each packet will have header and then the Packet Header packet body, this is illustrated simply in . The messages TCP are designed in such a way that if a peer doesn’t understand or implement that type of message they may IP skip and receive the next message. This is to ensure maximum backwards compatibility. Figure 3.5 Packet Diagram - 28 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.10.2Packet Header Pre-pended to every message sent out, it is used to help identify the content of the message, and allows for backwards compatibly by specifying the packet length allowing clients to skip packets they do not understand. 0 1 2 3 LENGTH TYPE DATA … LENGTH Length of the packet in bytes excluding the length field TYPE A one byte code to explain what is contained in the data field DATA A variable length field containing data stored as the type explained Current possible data types; 0 – Keep Alive 1 – Handshake 2 – Announcement 3 – Request 4 – Error 5 – Data 3.10.3Keep Alive This data type has no contents and is used to keep the connection alive. The length field for this packet must be 1 3.10.4Handshake This is sent once at the beginning of a connection. It allows each client to know what each other is capable of. Both parties must receive this message before any other message can be sent. Once the message has been received both parties will work using the lower major version, or error and disconnect. 0 1 2 3 VERSION MAJOR VERSION MINOR NAME … VERSION MAJOR Major version of the app, at time of writing only 1 is allowed and all data sent should conform to this specification VERSION MINOR Minor version of the app. Application are allowed to place any number in this field NAME Variable length field containing the name of the client 3.10.5Announcement This message is used to advertise to other peers what parts of the stream this peer is sharing. The client should always announce the entire stream they have unless a sharing algorithm is being used to help distribute the stream more efficiently. Such algorithms are discussed in section 3.12. - 29 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 0 1 2 3 INDEX START BITFIELD … INDEX START The index represented by the first bit in the bitfield. This is an integer value incremented by the source peer on each new piece. BITFIELD A variable length array of bits. Each bit represents an index one newer than the previous bit. The bit is set depending on whether the peer has that piece of the stream. 3.10.6Request This is sent when a client requires a piece of the stream from another client. The first and last indexes are sent asking for all pieces in the range of first ≤ x < last. The client must have announced all pieces between start and end before a request can be made. 0 1 2 3 INDEX START INDEX END INDEX START The index of the first requested part INDEX END The index of the last requested part. 3.10.7Data This can be sent after a request is made, or if a pre-emptive algorithm is being used to help distribute the data discussed in section 3.12.3. 0 1 2 3 INDEX START DATA … INDEX START The beginning index of this data. DATA Raw data of the stream starting at INDEX START. 3.11 Program Design The design is split into many classes, all designed to supply a specific task with abstract classes hiding the inner workings. The UML shown in shows a possible configuration of classes for a peer. Each major class will be explained in turn. - 30 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming OggPlayback FileWriterPlayback VideoPlayback These classes read Everything starts here from the datasource and play to a IO device, eg Screen, Speakers, File, etc PeerClient «interface» PlaybackInterface +main() 11 +PlaybackInterface(in DataSource : StreamBufferInterface) 1 1 Provides a threadsafe 1 access to a stream of data PeerManager A Source would use a FileStreamBuffer, -PeerList : PeerConnection A Normal peer would use a StreamBuffer +PeerManager(in DataSource : StreamBuffer) 1 +Close() 1 +Connect() +Listen() «interface» +isListening() StreamBufferInterface -FindNextPacket() +StreamBuffer(in elements : int, in elementSize : int) +Read() 1 «uses» +Write() This provides the TCP connection to the remote +Peek() peer. The PeerManager can query what pieces this peer may StreamBuffer FileStreamBuffer have, and issue requests out through this peer. «signal» PeerConnection * PeerEvent inout EventType A PeerConnection can notify +Open() its PeerManager of events +Close() occuring by sending it a +GetPieceMap() 1 PeerEvent object +RequestPiece() -SendPacket() A PeerPacket is a representation of the byte data sent over the connection. This may be incoming or outgoing data. The «utility» PeerPacketFactory will generate «interface» PeerPackets from incoming data PeerPacketFactory PeerPacket +Read(in Data : char) : PeerPacket Figure 3.6 UML Diagram of different classes within the system The program will be coded in an Object Orientated (OO) language, namely C+ +. There are numerous reasons to code in C++, the main ones are the speeds and abilities of C, plus the Object Orientated aspects allowing for a very modular design and code reuse. The design will make use of a lot of OO concepts such as inheritance and interfaces to allow the clients to be flexible in the media they transfer and in the way they do it. 3.11.1PeerClient This is a very simple class which contains the main method and the starting point of the program. It will parse any user input and create the correct classes depending on the features required by the client. 3.11.2StreamBufferInterface A key class which stores the pieces sent and received. It should be created with two parameters setting the number of pieces it holds and the size of each piece. It will implement methods such as Read(), Write() and Peek() allowing it to randomly read and write to any stored piece and to read from the buffer as if it was a stream. If it’s not possible to read sequentially due to a required piece missing the object should block until the piece has been downloaded or after a given time the function returns - 31 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming an error which the Playback device should deal with. The two main implementations and a possible third are listed here; StreamBuffer A simple random access buffer designed to store pieces which are randomly inserted, and allow stream like reading from this buffer. FileStreamBuffer A read only buffer which is created from a file. This type of buffer would be useful for a source peer. AudioStreamBuffer Another read only buffer which would pull the stream from a sound card or other input device. 3.11.3PlaybackInterface Because the protocol is media agnostic, any type of media can in theory be played back, for this reason an Interface class was created with common methods such as Play(), Stop(). Three possible implementations of this class are; OggPlayback Decodes the media as an Ogg Vorbis2 audiostream. FileWriterPlayback Writes the stream directly to file. VideoPlayback Decodes the stream as a Video stream encoded with MPEG or similar codec. All these classes would be created with an instance of a StreamBufferInterface being passed in. In most common cases a StreamBuffer would be passed in, however for debugging or testing purposes a FileStreamBuffer can be used. The playback will then occur from this buffer. 3.11.4PeerConnection This class is a logical representation of a connection to a remote peer. It will deal with all the network connections including packet sending/receiving and provide a high level view of the status of the remote peer. It exhibits functions such as Open(), Close() connections, SendPacket(), RequestPiece() and GetPieceMap() which will return a bit map of pieces the remote host has. 3.11.5PeerManager This class is where the majority of important algorithms will go. It will be designed to be a coordinator of all the PeerConnections. Internally it will store a list of all Peers with their PeerConnections. Decisions will take place to decide which pieces to download, which peers to connect to, which peers to drop, etc. The design of the algorithms will be mentioned in section 3.12. The main methods exposed will be Connect(), Close() and Listen() which connects to a new peer, closes all connections, and opens a port for incoming connections respectively. FindNextPacket() is a private method which will decide which piece and peer the next request should be for and from whom. 2 Ogg Vorbis, http://www.xiph.org/ogg/vorbis/ - 32 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.11.6PeerPackets PeerHandshakePacket As shown in figure 3.7 all packets will logically be PeerHeaderPacket represented as objects which implement the PeerPacket «interface» interface. The objects will have PeerPacket PeerRequestPacket simple constructors. One which +PeerPacket(in rawdata : char) +getContents() makes the packet from raw +getType() +getLen() received data. The other makes PeerDataPacket the packet from properties such as the client’s name in a PeerPacketFactory HandshakePacket. PeerAnnouncePacket The PeerPacketFactory +read(in data : char) : PeerPacket will be used to create the packets with the first constructor. The Usedintoturn recieved data to objects PeerConnection or any other class may call the static function Figure 3.7 UML of different PeerPackets read on the PeerPacketFactory class with raw received data as a parameter, it will then return an instance of a PeerPacket. This then hides the decoding process and allows the main application to deal with PeerPackets instead of all the individual types. 3.12 Algorithms 3.12.1Piece Picking Quality of Service When the client is running it must pick which pieces of the stream it will download from which peers. There are a few different methodologies to do this to provide best results and a certain level of QoS (Quality of Service). If the peer picks a peer which is too slow the time requirements of the media will expire and the quality of the stream will decrease. If however a peer gets all its pieces quickly at the cost of another peer then there is an unfair QoS in the network. If the source peer also gets overloaded with request the original stream won’t have even made it to the network on time. Possibly algorithms for pieces selection follow: • Random order - The pieces are picked from a random peer which doesn’t have any piece waiting to be downloaded. This has the properties of randomly distributing the load throughout the network equally, however low bandwidth peers will surfer if the average bandwidth required is higher then their own. • Uniformly cycling - Sequentially cycle through all connected peers downloading from the peers in a logical order. This will have similar properties to the random order. • RTT Values - A round trip time for each peer will be taken in terms of seconds with the quickest peers being picked in preference to the slower peers. This is not guaranteed to provide the best peer because RTT is not always an accurate estimate of bandwidth as discussed in “Improving Round-Trip Time Estimates in Reliable Transport Protocols” by Karn et al [30]. • Self Scoring - An internal count can be kept which numbers how many pieces have arrived in the last few minutes. The peers would then be sorted by this count allowing faster peers to be used more often, and if a peer becomes slow the count will decrease as it expirers and not used as often. - 33 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Also to ensure timely delivery when pieces are requested if there is no reply in a specific length of time the request will expire and another peer will be asked for that piece. 3.12.2Source Saturation Problem A design problem that may occur is a saturation of requests sent to the source. If all peers are buffered ahead as much as the source-peer then as soon as the source announces a new piece to the network they will all request that piece from the source and therefore congest the source. To solve this problem the source could only announce to a random set of peers which then would later announce the new piece and spread it throughout the network. 3.12.3Pre-emptive Sending In future extensions it may benefit peers to pre-emptively send connected peers newly acquired pieces instead of waiting for a request. This may be used by a source peer who doesn’t announce any new pieces but instead sends them intelligently. This will help solve the problem in section 3.12.2 and maybe lower the packet overhead and delays. 3.13 Code Testing Strategies One of the reasons the project will be designed in Object Orientated language is that strong unit testing strategies could be adopted. Each class should have a well defined job with clearly defined input and output parameters. It should be easy to define a set of test cases to test the functionality of each class. While coding takes place the classes should be tested independently of each other in special smaller test projects. A simple testing framework should be developed for the issuing of simple tests and checking the output. This can be as simple as demonstrated here in this trivial example: //Check if function computes correct sum int result = sum(10, 20); if (result != 30) throw error; //Check if function... A script in the above form could be generated that could be run against the class each time it is changed. If a more complex solution is required testing frameworks such as CppUnit3 or C++ Test4 which will give benefits of automated testing and allow greater sets of tests to be carried out. For the use of this project the simpler test scripts will be used due to the size of the project, smaller setup costs and ease of use. If in the future the project was to grow to include far more, a more advanced testing strategy would be adopted. 3 CppUnit, http://cppunit.sourceforge.net/ 4 C++ Test, http://cpptest.sourceforge.net/ - 34 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.14 System Evaluation Strategies Once the program has been completed an evaluation of its usefulness must be carried out. The evaluation should help decide if the requirements set out in 3.1 have been met and if not, why they haven’t. The main requirements to evaluate should be “Is it robust”, “Does it scale” and “Low control overheads”. The testing will be carried out by connecting a set number of peers to the network and begin streaming a test file. Detailed logs will have to be made to help the analysis afterwards. The majority of the testing will occur over a LAN, however at least one test will be over the internet. The reason for this is due to the small availability of internet connected machines. A suitable measurement for “Is it robust” is how long after a peer fails does it recover and how many peers are affected. This measurement can be recorded by setting up a set of peers to receive a stream and then randomly kill peers and see how long before the other peers catch up and how many needed to catch up. The scaling attribute is one that doesn’t directly affect the streaming protocol but more so affects the peer location aspect. While streaming a peer is only connected to a finite number of other peers. The number of connected peers will affect the streaming protocol however in normal operation this number is low, and even if it does become high the predicted results are still very acceptable. The direct problem with scalability is with the tracker and with the source peer. The tracker is a single point of failing and faults with this concept have been highlighted in section 3.5 Tracker-less Network, so for the scope of this evaluation the tracker will be ignored. Future research can be carried out into the concept of a tracker-less network, but with current Pastry and Chord implementations showing hugely scaleable networks we can partially infer that the scheme would scale. The source peer also could become a bottleneck if the system is to scale. It has already been discussed in section 3.12.2 Source Saturation Problem why this might be a problem and possible solutions. For the moment we will assume it could be a problem and apply some evaluation to the area. A suitable measure of this would be to record how many of the peers request the data from the source compared to neighbouring peers. If they request all the data from the source then the network has certainly not scaled, and the system is no better than standard single point streaming. If however a low percentage of data is requested from the source then the protocol is working well. An easy measure of control overhead is comparing the amount of data received compared to the amount of overhead required. This will simply take the form of the size of the announce and request packets compared to the size of the data packets. 3.14.1Predicted Results This section hopes to make predictions about the testing, this will aid in the analysis of any data collected, and help pin point possible problems before they occur. The predictions will be calculated by modelling the protocol as a set of equations using optimal conditions. - 35 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Sequence of events for acquiring a new piece. Where ConnectedPeers is the number of peers you are connected to, and PieceSize is the size of each piece. In this example ConnectedPeers=100 and PieceSize=10,000bytes 1 IN Announce 5 header bytes + 5 bytes 10 bytes 2 OUT Requests 5 header bytes + 8 bytes 13 bytes 3 IN Data 5 header bytes + 4 bytes + PieceSize bytes 10009 bytes 4 OUT Announce ConnectedPeers * (5 header bytes + 5 bytes) 100 bytes Total Overhead 123 bytes Total Data 10,000 bytes Table 3.1 Sequence of events for acquiring a new piece The first calculatedly prediction is “Low control overheads”. Since in advance time it’s possible to work out how much overhead is sent we can make a fairly reliable prediction. Table 3.1 shows the common pattern of actions needed to download a piece of the stream. It can be seen that if a peer is connected to 100 other peers that the overheads required would be 123 bytes, for every 10,000 bytes of data. This represents a percentage of 1.23%. Of course both these values will change depending on the number of connected peers and the size of each piece. The source peer scaling issue is harder to predict because in the current implementation timing issues will affect this greatly. The range this metric can take is: 1 n = number of peers ≤ x ≤1 (n − 1) x = percentage acquired from the source This equation shows a huge range of values. In ideal conditions a network of 10,000 peers would only use the source for 0.01% of the traffic; however in the worst case scenario it would use it for 100% (which is as bad as single point streaming). It should be noted that these two ranges represent the source sending out between 1 and n copies of the stream, where 1 would be preferable and the absolute minimum and n being bad and the absolute maximum. Robustness would be the hardest property to accurately record due again to timing issues and the number of factors involved. In a simple example, say there is a network of 100 peers, with each peer requesting one packet from 10 others. If one peer dropped offline without warning, on average up to 10 other peers would be affected. Each of these 10 peers would wait a reasonable amount of time before giving up and trying another peer. If this wait time is 5 seconds, each peer would be delayed 5 seconds. If however the peers are all pre-caching 30seconds of data then they can withstand 6 failures in a row before the stream playback is affected. For that number of failures to occur 6/10 of the network would have to fail at the same time. The chance of this is low in a diverse and widely spread network of 100 peers and seems very unlikely. 3.15 Summary This chapter has discussed all aspects of a proposed system. It began by laying out the requirements needed of such a system and move on to discussing the type of network which would fill these requirements. The more detail concepts of peers and trackers were designed, with protocol defined. The chapter ends with code and system testing strategies which will aid the testing of the system in Chapter 6. The next chapter will now discuss how the implementation of the system differed from the design and if any problems occurred that forced design changes. - 36 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 4 Implementation The implementation turned out being very similar to the designed solution with only a few alterations and additions. This chapter hopes to follow on from the design and explain what changed and what problems were encountered. 4.1 Changes This section will highlight, in detail the subsections of the program which were changed, and how they fit into the overall design. 4.1.1 Tracker PeerClient Tracker TrackerProxy CHTTPReply 1 PeerClient +Join() +setReplyCode() +List() +addHeader() +main() +Part() 1 +setBody() +TimeExpired() Peer 1 1 Tracker TrackerMain -IP -Port Web::URL *1 Web::Request HTTP Connection 1 1 CHTTPRequest +getHeader() +getURL() Figure 4.1 UML Diagram of tracker design The program design of the tracker and the code which connected the peer to the tracker was completely left out of the design section. shows how the PeerClient uses a TrackerProxy object which abstracts the sending of a HTTP request to the tracker and waiting of the reply. The tracker side of the diagram will wait for HTTP connections and deal with them internally. is a UML sequence diagram that more accurately displays how the inner class communications work. The tracker on its own is just a simple web server with each incoming request being logically turned into a CHTTPRequest object. This then gets passed to whatever web application the web server is running, in this case passed to a TrackerMain object. The TrackerMain object then parses the request and produces a CHTTPReply which the web server then serializes back into a normal HTTP reply. - 37 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming PeerClient Tracker CHTTPRequest TrackerMain CHTTPReply HTTP Message Create New Handle Request (CHTTPRequest) getURL Create New addBody() addHeaders() Returns CHTTPReply Returns Reply Figure 4.2 UML Sequence diagram on how the tracker works internally 4.1.2 PeerManager The PeerManager class became the core of the program as the development process took place. It was responsible for deciding which peers to connect to and which pieces to download from which peer. PeerManager PeerClient PeerManager ListenThread KickstartThread PeerConnection PeerEvent Listen(port) CreateThread Connect(host,port) CreateThread Create(host, port) Open() FindNextPacket() SendPacket (Request) Send Recv Create ThrowEvent(Data Received) ThrowEvent(PieceReceived) Save Data Destory SendPacket (Announce) Send FindNextPacket Figure 4.3 UML Sequence diagram of PeerManager connecting to a peer, then requesting a piece and finally announcing its completition - 38 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming It would also decide when and to whom it would send packets, for example Announcements or Data packet. It also handled all the notification events sent by objects such as the PeerConnection. A typical sequence of events is expressed in figure 4.3. The sequence diagram shows the PeerManager being set to listen for incoming connections. This operation causes the PeerManager to spawn an internal thread which loops waiting for a connection. At an un-determined time later the PeerClient or the ListenThread will send a connect function call to the PeerManager which tells the PeerManager about a new Peer connection. The peer manager then firstly checks if the KickStart thread is running and if not spawns it. Following this it will create a PeerConnection object which connects to the remote peer. The KickStart thread will fire off every 5 seconds to check if any connections are free for download. It carries this out by calling the FindNextPacket method on the PeerManager, which will, in turn find the first missing piece and then request it from the newly formed PeerConnection (we are assuming that this new PeerConnection has the piece). The PeerConnection internally handles the request and download of the piece and stores it in the correct StreamBuffer. At this point a PeerEvent object is created and sent to the PeerManager to signal the download of some data. In turn the PeerManager sends its own event to the PeerClient saying that a piece was downloaded. Finally the PeerManager calls SendPacket(Announce) on all PeerConnections. This whole process is then repeated with the next piece and/or new peers. 4.1.3 Vorbis Ogg Playback Library «interface» «interface» OggPlay PlaybackInterface StreamBufferInterface +PlaybackInterface(in DataSource : StreamBufferInterface) +isPlaying() 1 +Play() 1 +Stop() 1 OggPlayStream 1 -vorbis_close_func() FileWriterPlayback VideoPlayback OggPlayback -vorbis_read_func() libvorbis -playBackThread() WaveOut 1 +open(in rate : int, in channels : int) RAW Audio Travels Between +writeAudio(in data : char, in size : int) These Objects Figure 4.4 UML Class Diagram of OggPlayback The first implementations streamed Vorbis Ogg audio. Due to the media agnostic nature of the protocol and good design patterns the only section of the application which needed changing to allow different media types was the Playback class. A new inherited class called OggPlayback was designed. It exposes the same methods as any other playback device but internally decodes the steam into a RAW WAV format and directs this at the default sound card. This is depicted in . The decoding of the audio was done by the library libvorbis, developed, in part, by the Ogg Vorbis Codec project5. This library was wrapped by the OggPlay class which, in turn was wrapped by the OggPlayback class. In addition to the OggPlay class, an instance of a WaveOut class was inside the OggPlayback class. 5 Ogg Vorbis Codec Project, http://www.xiph.org/ogg/vorbis/ - 39 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming This WaveOut class was created to wrap around the Win32 WaveOut* API which is used to make the computer to play the audio. 4.1.4 Bitmap Class bitmap The bitmap class was one of -mapArray : char fundamental importance which -logicalSize : int wasn’t considered in the design and -physicalSize : int -hSema : Semaphore had to be designed and created during +get(in index : int) : bool implementation. It would provide a -get(in index : int, in gotSemaLock : bool) : bool logical abstraction to an array of bits +set(in index : int, in value : bool) that represented different indexes in -set(in index int in value : bool, in gotSemaLock : bool) +getStart() : : int, the stream. Such an array was used in +setStart(in newStart : int) Announce packets described in +andBitmap(in bitmap : bitmap) section 3.10.5. The class became a +orBitmap(in bitmap : :bitmap) +notBitmap(in bitmap bitmap) very important one which was at the +bitmap(in size : int) centre of most objects, including the Figure 4.5 UML Class Diagram of bitmap StreamBuffers. Internally it would use a malloced char array of n/8 + 1 bytes where n was the number of indexes the bitmap represented. The class would also keep an integer which represented the logical start of the array. This was so the array could start at any arbitrary high index with the char array size still only being small. shows the class definition of a bitmap highlighting each function. The operations the functions carried out were comparatively simple but considerations had to be taken to make sure the class would be thread-safe. During the testing stage the class had to be altered many times to ensure deadlock didn’t occur and to ensure no two threads were able to alter the contents concurrently. The class would be created with a single parameter in the constructor which set the size of the map in bits. The default would be for the map to start at index 0. The setter and getter methods {set,get}Start allowed calling classes to move the bitmap’s starting index, and to get the current starting index. When the starting index was moved along it would update the internally stored byte array correctly by setting and un-setting bits appropriately. If the new start was not within the range of the old array all the bits would accordingly be set to zero. The public get and set methods allow the calling code to get or set the value of a specific index in the bitmap. There are two private get and set methods with an optional last parameter called gotSemaLock which allows the internal code to skip any concurrency considerations to avoid deadlock. Finally the bitmap has only 1 other set of functions of interest, the {or,and,not}Bitmap functions. These allow a bitwise operator to be applied to all values which are in the same range of both bitmaps. 4.2 Problems Encountered During implementation of the design numerous problems occurred that caused design changes to be made, and many “hacks” to be introduced. The problems of noteworthy interest are explained in this section. 4.2.1 StreamBuffer changing without notification The StreamBuffer was a key part of abstraction which allowed either a memory based buffer, or a file based source to be represented as the same object. The - 40 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming abstraction worked perfectly until the concept of “Start” was introduced. For example the memory StreamBuffer would have a start in the middle of the buffer, having half the buffer store old used pieces, and half storing new soon to be used pieces. The problem comes when you decide where the middle (or start) of this stream is, and when it should be incremented. It was decided that the middle would be determined by the playback device. For example, the OggPlayback object would be responsible for moving the middle along and causing of data to expire, and allowing new data to be downloaded. In-directly this created the problem that the PeerManager wouldn’t be notified when there was more space in the buffer, (or in the case of the FileStreamBuffer) when new pieces had been added. A hack to avoid this problem was to have a thread inside the PeerManager which would continuous poll the StreamBuffer checking for changes, and if one was detected the correct action should be taken (i.e. Request a new piece, or Announce a new piece being available). 4.2.2 Concurrency Issues One thing the UML didn’t show was live times of objects and how different objects may interact with the same object concurrently. It was always assumed that many threads would try accessing the same objects so a lot of concurrency controls were put in place, however there was some which weren’t obvious enough and caused a few problems. One such case was the peer list object kept inside the PeerManager. It would be continuously accessed from many threads inside of PeerManager and all worked fine until 16 peers were online, at which time the number of access started to collide and the list would corrupt. 4.2.3 Self Connecting Peer & Peers Connecting Both Ways A design flaw which was also a feature made it impossible to tell if you have already connected to a specific peer. The reason being was peers should not identify by their IP and Port address alone, instead they should be identified by their PeerID. However for security reasons the peer ID was kept secret, so only the tracker would know the peer’s ID. This has now left the problem where a peer may connect out to another peer, later that peer then connect in, causing two way connections. Even more alarming is that a peer may connect in to itself. Both of these situations cause unneeded and additional data to be transferred. To solve the self connecting problems a hack was made in the tracker which stopped the requesting peers details be displayed in the list returned. The first problem of two way connections couldn’t as easily be solved without a redesign. A suggested solution would be to change the peer handshake protocol to send the peer’s ID. This would then allow you to not connect to a peer twice incorrectly. However the peer IDs would not be secret anymore so the tracker would have to be changed to only accept commands for a specific peer ID from only the IP which first registered it. This is still subject to exploit, by IP Spoofing etc, however, that seems an acceptable compromise. 4.3 Algorithms Used The majority of the program uses very simple algorithms. This is due to the good Object Orientated practice and the good design upfront. These two factors allowed the program to be relatively easy to code and any problems were easily fixed. This section hopes to show the most complicated algorithms used in the program. - 41 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 4.3.1 FindNextPiece The FindNextPiece method was responsible for looking at all the PeerConnection objects and deciding which piece was next to be downloaded. This algorithm is displayed in . The method consists of two main loops; the outer loop incrementing the piece being looked for and the inner loop cycling through all peers. The outer loop will loop until a suitable piece is found. Once one is found the second loop works by checking if first the peer has the piece and then if the peer is free to send the piece. If both conditions are true the method may return with the knowledge of a piece and peer. If this fails the next peer in the list is considered and this continues until there are no more peers. In that case the inner loop breaks and the outer loop is able to increment to the next piece. The inner loop process thus starts again now. This sequence of events continues until the outer loop breaks. Method starts here FindNextPiece (X = 0) Method returns here after a failure Look For Next Failed To Find Do We Want Piece Any Piece Piece X? (X = X + 1) YES Method has succesfully found a piece and can now request it from the peer Has We YES Found Next Piece Allready Got And Peer Piece X? NO Are we Look For Peer currently YES YES NO Get Next Peer NO With Piece X downloading this piece? NO Has Peer Got More Peers Failed To Find Is Peer Free? YES YES NO Piece X? Exist? Piece X NO NO Figure 4.6 Flowchart of FindNextPiece 4.3.2 PeerConnection Thread This thread is responsible for sending and receiving data from the TCP socket. The following pseudo code shows how it works: While (connected) { Loop through send queue { Send(packet in queue) If TCP errors occurred exit } // On the first pass we always want 4 bytes // (which is the length of the packet) Size = 4 - 42 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming FirstPass = true // This loops until the size of data has been read While (size > 0) { Recv(size bytes of data) If TCP errors occurred exit If FirstPass { Size = data Len = Size FirstPass = false } else { Size = Size – amount of data read } } // Now generate a packet packet = PacketFactory(data) // This is where the application logic goes // to decide what to do DecideWhatToDo(packet) } As you can see this code will continue to loop while the Boolean connected is true. Inside the loop the first thing which happens is all queued messages are sent. This should be clear from the first few lines. The remaining bulk of code is to send the data. The reason for its complexity and length is because we are receiving the data in blocks instead of a stream. This, of course is a limitation of the Winsock API which should have been abstracted out with a class able to expose a stream of data. The code works by first reading 4 bytes of data, which due to the way the packets are formatted would represent the length of the remaining packet. The code will then loop reading this remaining number of bytes. The code has to loop instead of trying to request all the bytes directly because the incoming data may have been fragmented across many IP Packet and as such recv will return a fraction of the total each time. Once size reaches zero all the data has been read and the code drops down to the PacketFactory. This factory class takes raw packet data and creates an object which logically represents the data. This object is then dealt with and the program continues to loop for more data. 4.4 Summary This chapter has successfully identified the changes made to the design, and discussed any problems encountered. Algorithms have been discussed in detail for the more important classes. Next chapter will display the look and feel of the system. Each main program of the system will be discussed with screen dumps to explain normal operation. - 43 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 5 System in Operation The previous chapter discussed the implementation of system, how it was made and what issues were encountered. However the previous chapter didn’t discuss how the system looked and operated to the user. This chapter will show how the programs would appear and how to use them. The actual programs themselves are all very simple. Since this project is more focused on the research and design of a new protocol, features such as GUIs were not important and thus not included in the programs. The programs were designed to be as simple and informational as possible. For these reasons all programs are simple console based ones which log very detailed information on what is happening internally. This console log is also outputted to a file for later inspection. While the programs are running they require no user input. The only input they will accept is a ctrl+c to close the program. To control the programs, command line arguments may be specified. 5.1 Tracker The first component of the system is the tracker. The tracker is always needed in the network to provide a peer location service. The tracker is started with the following arguments: tracker.exe {ogg filename} You must specify an Ogg filename to the tracker. The only reason you need to do this is so the tracker can read some meta-data about the stream. Once this has been carried out the tracker starts running and waits for incoming connections. [15:05:17] Program starts [15:05:17] Now serving: d:Jeff Wayne - War of the Worlds - Disc 1 - 128 kbps.ogg [15:05:17] Created server socket on port 8000 [15:05:22] Accepted new connection [10.36.152.128] [15:05:22] 10.36.152.128 http://10.36.152.128:8000/?action=join&peer- id=EGA010000001opinctfh&peer-ip=10.36.152.128&peer-port=4567 [15:05:22] Closed connection [10.36.152.128] [15:05:23] Accepted new connection [10.36.152.128] [15:05:23] 10.36.152.128 http://10.36.152.128:8000/?action=list&peer- id=EGA010000001opinctfh [15:05:23] Closed connection [10.36.152.128] [15:05:56] Accepted new connection [10.36.152.130] [15:05:56] 10.36.152.130 http://10.36.152.128:8000/?action=join&peer- id=EGA010000003ccaapzal&peer-port=4567 [15:05:56] Closed connection [10.36.152.130] Figure 5.1 Log generated by a tracker Figure 5.1 shows the typical connection of two peers. The first peer connects from 10.36.152.128 with the ID EGA010000001opinctfh. You might notice that the first 11 characters of the peer ID don’t appear randomly generated. They are in fact the name of the computer the peer is running on. For debugging and testing reason the peers would generate semi-random PeerIDs with their computer name prefixed. - 44 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Once the peer has connected it issues a join with its port and IP. Next, the peer re-connects and issues the list command to obtain a peer list. In this case the list will be empty. The second peer now joins from 10.36.152.130 and again issues the join and then list commands. 5.2 PeerSource The second most important component is the PeerSource. The usage of this program is just as simple as the tracker. The program will accept parameters like so: PeerSource.exe {tracker url} {source file} [{ip}] The tracker URL and the source file are both required, the IP parameter is optional. An example use would be: PeerSource.exe http://tracker.com/ mymusic.ogg 192.168.0.1 This would connect to the tracker at tracker.com and begin sharing the stream pulled from the file mymusic.ogg. On the join request the peer will also indicate it is coming from IP address 192.168.0.1. It has to indicate its IP address for the situations when the tracker and PeerSource are ran on the same host. In this case, the tracker will see the source connecting from 127.0.0.1 and record that IP address. Later, if the tracker sends a peer list the IP 127.0.0.1 would be listed, which would in turn cause problems for the remote peers. Once the source has started, it will output textual information similar to the tracker. This is shown in Figure 5.2. In the figure’s example it shows the SourcePeer first connecting to the tracker at 10.36.152.128:8000. It returns a list of no peers, presumably because the Source was the first to connect. 30 seconds later a remote peer connects in from 10.36.152.130:2296. The peers begin by exchanging handshakes and announcing which pieces of the stream each has. The remote peer then sends a request and the source peer replies with the data. Once the data was transmitted, the remote peer announces the piece has been completed and continues to request more. Eventually the program is ended at 15:07:20 when a ctrl+c is pressed at the console. This ctrl+c tells the program to cut all connections and to try and clean up. A few seconds later the program has dealt with any clean up and quit. [15:05:22] ***************** PROGRAM START ***************** [15:05:22] Sending Tracker Join (10.36.152.128:8000) [15:05:22] Getting Tracker List (10.36.152.128:8000) [15:05:23] List: returned [15:05:57] [10.36.152.130:2296] 8 PeerConnected [15:05:57] [10.36.152.130:2296] OUT 19 Handshake Version:1,0 Andrew's Client [15:05:57] [10.36.152.130:2296] OUT 7 Announcement Start:0-24 111111111111111111111111 [15:05:57] [10.36.152.130:2296] IN 19 Handshake Version:1,0 Andrew's Client [15:05:57] [10.36.152.130:2296] IN 14 Announcement Start:0-24 000000000000000000000000 [15:05:57] [10.36.152.130:2296] IN 8 Request Start:0 End:1 [15:05:58] [10.36.152.130:2296] OUT 10004 Data Start:0 - 45 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming [15:05:58] [10.36.152.130:2296] IN 14 Announcement Start:0-24 100000000000000000000000 [15:05:58] [10.36.152.130:2296] IN 8 Request Start:1 End:2 [15:05:58] [10.36.152.130:2296] OUT 10004 Data Start:1 [15:05:58] [10.36.152.130:2296] IN 14 Announcement Start:0-24 110000000000000000000000 [15:07:20] Quiting.... (ctrl+c) [15:07:21] [10.36.152.130:2296] 9 PeerDisconnected [15:07:21] [10.36.152.130:2296] SocketError Socket Error 10053 [15:07:21] ****************** PROGRAM END ****************** Figure 5.2 Log generated by a PeerSource (altered to improve readability) 5.3 PeerClient This is the last program in the system which most of the hosts in the network would be using. It works almost identically to the PeerClient due to the fact it uses the same code base. It is started by using the following parameter: PeerClient {tracker url} This time only a tracker URL is needed. Once connected to the tracker the tracker will explain all the other details such as media type etc. [15:05:55] ***************** PROGRAM START ***************** [15:05:55] Sending Tracker Join (10.36.152.128:8000) [15:05:56] Getting Tracker List (10.36.152.128:8000) [15:05:57] [10.36.152.128:4567] 8 PeerConnected [15:05:57] [10.36.152.128:4567] OUT 19 Handshake Version:1,0 Andrew's Client [15:05:57] [10.36.152.128:4567] OUT 14 Announcement Start:0-24 000000000000000000000000 [15:05:57] [10.36.152.128:4567] IN 19 Handshake Version:1,0 Andrew's Client [15:05:57] Ogg: Now playing [15:05:57] Ogg: TITLE=War of the Worlds [15:05:57] Ogg: ARTIST=Jeff Wayne [15:05:57] Ogg: Bitstream is 2 channel, 44100Hz @ 128kpbs [15:05:57] [10.36.152.128:4567] IN 7 Announcement Start:0-24 111111111111111111111111 [15:05:57] [10.36.152.128:4567] OUT 8 Request Start:0 End:1 [15:05:58] [10.36.152.128:4567] IN 10004 Data Start:0 [15:05:58] [10.36.152.128:4567] PartComplete 10004 Data Start:0 [15:05:58] [10.36.152.128:4567] OUT 14 Announcement Start:0-24 100000000000000000000000 [15:05:58] [10.36.152.128:4567] OUT 8 Request Start:1 End:2 [15:05:58] [10.36.152.128:4567] PartComplete 10004 Data Start:1 [15:05:58] Ogg: Read 8500 bytes 0 missing [15:05:58] Ogg: Read 8500 bytes 0 missing [15:07:25] [10.36.152.128:4567] SocketError Socket Error 10054 [15:07:25] [10.36.152.128:4567] 9 PeerDisconnected [15:07:25] Quiting.... (ctrl+c) [15:07:26] ****************** PROGRAM END ****************** Figure 5.3 Log generated by a PeerClient (altered to improve readability) - 46 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Figure 5.3 shows the output generated from the PeerClient’s side when connecting to the PeerSource listed in Figure 5.2. It connecting to a tracker, then a peer, then quits after the remote peer disconnects. 5.4 Summary The aim of this chapter was to demonstrate how the system looked and operated to the user. This aim has been successfully carried out and the very basic user interfaces have been explained. This chapter however didn’t talk about the correctness of the system or how it actually operated internally. The next chapter will test and evaluate the system to prove its correctness and to record statistics for its usefulness. - 47 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 6 Testing This chapter will focus on testing the correctness and usefulness of the system. Sections 6.1 and 6.2 focus on testing how the systems behaves and if it follows the specifications laid out in the design chapter. 6.1 will begin by black box testing some classes and then progress into 6.2 which will combine one of more of these classes to continue performing bottom up testing. Not all test results will be shown in this section. Any major issues found in the program will be highlighted, otherwise all other classes worked as expected or failed first time being fixing with a trivial case thus not noteworthy. Section 6.3 hopes to test how well the system performs in a lab and real environments. This section will then form the base of data that will aid the analysis in the next Chapter Evaluation. 6.1 Unit Testing Tables of tests were carried out on some of the class used by the peers. These tables were devised by inspecting each method exposed on each class and systematically calling each of these methods with test data. The test data was chosen to be in three categories, Typical, Extreme and Erroneous. The majority of tests were carried out at development time therefore not everything is documented. However to help provide completeness some tests have been documented. 6.1.1 Bitmap Class As discussed in section 4.1.4 the Bitmap class became one of the most important classes, as such repeated tests were carried out. The table of test cases is displayed in section 10.1 of the Appendix. Here we will discuss the failures and why something went wrong. The first set of failures which has since been fixed was the setStart() method. This method moved the logical beginning of the array up or down, and as such had to move the contents as well. The problems was two fold; Firstly, setStart() would act very randomly and sometimes only move segments of the array instead of the whole array. Secondly setStart() would only move on the byte level (i.e. in chunks of 8 bits), however requests made to setStart() didn’t always start on a multiply of 8 and as such the beginning index and the array would get out of sync. The setStart() problem was fixed by rethinking the logic of the class and re- coding most of it with additional internal variables such as logicalStart, physicalStart, logicalSize, physicalSize. These all help keep track of the internal states instead of the states being calculated on the fly from just a few variables. The next series were of failures labelled F1 and F2 were caused by the lack of checking inside the constructor. It should be noted that these tests still failed since the class hasn’t been updated. It was considered acceptable for these tests to fail since they wouldn’t interfere with normal operations. The final failure is F3. Due to the setStart() reasons listed above, a large amount of time was put into making it function correctly. Even still setStart() doesn’t function when the new start is smaller than the current since this added more complexity and is a feature which wouldn’t be used. The code could be added in the future to allow this but at the moment an exception is thrown indicating the code isn’t finished. - 48 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 6.1.2 StreamBuffer Class The StreamBuffer class was the main data source for the PeerClient. It was very important because data was being written to and read from all the time. If this class didn’t behaviour correctly the stream would corrupt. The test cases in section 10.2 show that the class operated mostly correctly. The only faults with the class occurred when it was constructed with incorrect parameters. The numbers passed were erroneous and the constructor didn’t provide valid checking for these cases. The last test to fail was when an extreme parameter was passed to the constructor. This parameter made the class malloc many gigabytes of RAM. After mallocing 2GB successfully the program (including the development environment, and some windows applications) crashed with fatal Out Of Memory errors. This is to be expected since the memory limit of the OS was reached. Maybe a hard limit should have been sent on the StreamBuffer however this was never considered in design. 6.2 Integration Testing Testing of high level components require an exponential increase in test cases and time. For this reason the testing in this section will only include one class. The other high level classes have been tested to function correctly however individual test cases were not carried out. 6.2.1 PeerConnection Class This is the class responsible for logically representing the connections to and from the remote Peers. It is also responsible for sending and receiving packets. Any received packets are also de-serialised into PeerPacket objects. Due to the difficulty in making test cases for incoming data from a remote source these tests will be excluded from the official test cases. Instead, these tests were carried out during the implementation of the class. PeerConnection 1 +Open() +Close() +GetPieceMap() +RequestPiece() «interface» bitmap -SendPacket() StreamBufferInterface +StreamBuffer() +get() +Read() +set() +Write() +Peek() «utility» PeerPacketFactory «interface» PeerPacket +Read(in Data : char) : PeerPacket Figure 6.1 UML Class Diagram of a PeerConnection shows the components which make up the PeerConnection class. Each individual component has been tested separately and is reported as working correctly. When the PeerConnection is used it may expose flaws in the sub-components which weren’t previous tested with that kind of data. The test results shown in section 10.3 indicate that very few problems were found. In fact, during implementation, PeerConnection did have some problems mostly relating to concurrency issues as discussed in section 4.2.2. - 49 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 6.3 Performance Testing Testing of the system took place in a control environment with a number of hosts. At most 16 hosts took part over an unloaded 10/100mbit network. While testing took place all hosts logged data sent and received. Eleven different tests took place over the course of two days. The different tests and changes in each one is listed in Table 6.1 Test Hosts Notes 1 3 (2 Peers, 1 Source) 2 6 (5 Peers, 1 Source) 3 6 (5 Peers, 1 Source) 4 6 (5 Peers, 1 Source) Delayed starting of source 5 6 (5 Peers, 1 Source) 30 Seconds staggered start 6 12 (11 Peers, 1 Source) 7 13 (12 Peers, 1 Source) 8 15 (14 Peers, 1 Source) Following tests with changed algorithm 9 16 (15 Peers, 1 Source) 10 16 (15 Peers, 1 Source) 11 6 (5 Peers, 1 Source) Table 6.1 List of tests carried out on the system The tests were carried by streaming a 45 minute long Vorbis Ogg file of the audio book “War of the Worlds”. The stream was encoded at 160kbps. Not all tests were run for the full 45minutes however they did run long enough to get a good sample of results. Tests 1-3 and 6-11 were run without any special settings. In each case the source was started first, and then the remaining peers were started within the next 30seconds. Tests 4 had the source peer start after the peers; this allowed all the peers to start exactly at the same time and to allow testing of the load placed on the source. Test 5 was tested over a 5 minute period with each peer starting 30seconds after the previous. This was an attempt to simulate new peers joining regularly during the playback of the stream. The final 3 tests, 9-11 were carried out in the same way however the FindNextPiece algorithm was changed to allow a better rotation of the peer list. The reason for this change was that preliminary results weren’t promising enough so a change was made and re-tested. Specifically the algorithm described in section 3.12.1 was changed from a “uniformly cycling” to a “self scoring”. Logs of all the tests can be found online at the working documents website. Tables of results analysed from the logs may also be found with the working documents online. Test 1 2 3 4 5 6 7 8 9 10 11 Hosts 3 6 6 6 6 12 13 15 16 16 6 Efficiency 18% 49% 21% 3% 40% 56% 22% 61% 33% 61% 25% Overhead 0.39 0.78 1.20 1.08 0.65 2.35 2.11 2.59 0.79 1.76 0.87% s % % % % % % % % % % - 50 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming Table 6.2 Summarised results from 11 test cases Table 6.2 shows the results of interest. The row labelled Efficiency is a percentage of how much effective the streaming was compared to a single source stream. It is calculated by summing the amount of data sent by each peer, and then comparing this to the amount of data sent by the source peer. This is shown in this equation: Sum of Peer ' s Data 1− Source Data For example, if there were two peers (one source and one receiving peer), this number would be 0% because the source peer sent 1 whole copy, the receiving peer sent nothing, thus evaluating to: 1 1 − = 0% 1 This efficiency will obviously never reach 100%, and the maximum will vary depending on the number of hosts within the network. For a more detailed explanation see paragraph 3 of section 3.14.1. The last row labelled Overheads is the calculation shown in paragraph 2 of section 3.14.1. It represents the amount of control traffic divided by the amount of data received. 6.4 Summary This chapter has tested the correctness and the operations of the system. Not many tests were carried out, however from the results gained in section 6.3 it can easily be seen that the system functions correctly. Section 6.1 showed a few components of the system being tested, then in section 6.2 these components were linked together to provide results for an integration test. The chapter finishes with tests before on the system across many machines. The next chapter will now take the results obtained in section 6.3 and analysis and evaluate them. This will help form the basis of the conclusion. - 51 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 7 Evaluation This section hopes to analysis the data collected in the previous chapter. It will use the approaches discussed in section 3.14 to evaluate the system. This includes statistical and graphical methods to calculate how well the system performed depending on the requirements set in section 3.1. These results will also be compared to the predicted results generated in section 3.14.1. 7.1 Efficiency As explained in section 6.3 and 3.14.1 the efficiency is measured by how more effective the system is, compared to single point streaming. This is the metric that will be used to evaluate requirement 1.4 (move the stream distribution load away from the source”). Table 6.2 shows the results to be in the range 3% to 61% with an average of 35% and standard deviations of 19%. This instantly indicates that the protocol is already 3 times better than existing deployed streaming technologies. The standard deviation however suggests that the protocol is inconsistent and results can vary greatly. When the algorithm changes were made to the system, it improved the average by 6%, and maintained a similar standard deviation. However due to the large variance observed and the low number of test it can not be inferred that the algorithmic change made much, if any improvement. To further evaluate the results Figure 7.1 displays a graph of efficiency versus peer count. The graph shows two sets of points and a shaded area. The two data sets are the tests before and after the algorithm change. The shaded area indicates the maximum possible efficiency achievable for that number of peers. In theory no point should be outside the shaded area, however, the closer the point is to the top the better. No line of best fit was drawn on the graph due to the well spread results. - 52 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 100 Percentage of the stream forwarded by non-source peers 90 80 70 60 50 Efficiency % 40 30 20 10 Theoretical Best Tests 1 Tests 2 (Rotate Algorithm) 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Peers Figure 7.1 Graph of Percentage of the stream forwarded by non-source peers - 53 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming The graph shows the majority of points are above 20% however there is one stray point at 3%. The 3% result was generated from test 4. This was the test with the delayed starting of the source peer. It can be debated that this result was due to the Source Saturation problem discussed in section 3.12.2, where all peers see the source announce at the same time and therefore all request at the same time. It is obvious that this would be a problem that requires fixing in the future. Section 3.12.3 may provide one solution. From the graph there may also be a slight upwards trend of efficiency with more peers. This might be caused by timing issues between the peers. If there are more peers online, the drain on the source would be greater and as such the delay between the announcements to all the peers would be slightly lagged. This would cause announcements sent by normal peers to arrive before the source peer’s thus allowing some peers to request the stream from a neighbour instead of the source. The 5th test which allowed each peer to be started on 30second increments returned a result of 40%. This number is roughly the same as the average and shows that peers joining at different times throughout the stream don’t impact performance. Overall the efficiency appears to be good, however further testing should go into changing the algorithms used. Even with the current un-tuned implementation, tests have shown the system reduces the bandwidth used on the source by between 1 and 3 times. Future version could increase this value by much more. 7.2 Overheads The next measured criterion was the amount of overheads sent to provide the stream. This will help analyse the system’s ability to fulfil requirements 1.3 (stream data with low control overhead) and 1.5 (be scalable). In section 3.14.1 these figures were estimated and are displayed in Table 7.1 with the observed results: Test 1 2 3 4 5 6 7 8 9 10 11 Hosts 3 6 6 6 6 12 13 15 16 16 6 Overheads 0.39% 0.78% 1.20% 1.08% 0.65% 2.35% 2.11% 2.59% 0.79% 1.76% 0.87% Estimated Overheads 0.30% 0.60% 0.60% 0.60% 0.60% 1.20% 1.30% 1.50% 1.60% 1.60% 0.60% Difference 0.09% 0.18% 0.60% 0.48% 0.05% 1.15% 0.81% 1.09% -0.81% 0.16% 0.27% Table 7.1 Predicted overheads compared to observed overheads It can quickly be seen that the predicted results are lower than the real observed results. This means that either the predictions were wrong, or the system functioned worse than expected. In section 4.2.3 it was shown that there were problems with the peers connecting to themselves, and making connections in both directions between peers. If this is the case then the predicted results for 6 peers isn’t really 6 but instead should be 12 because logically each peer thought they were connected to 11 others. This might help explain why the overheads were higher than expected. Figure 7.2 is a scatter plot of the results obtained. The x-axis shows number of connected peers, whereas the y-axis shows the percentage of traffic sent which was control traffic. There are two data sets drawn, one is the predicted results, the other is the test results. For the test results a line of best fit has also been added. - 54 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 3.0% Protocol Overhead vs Data Received (Depending on Number Of Connected Peers) 2.5% Percentage Protocol Overhead 2.0% 1.5% 1.0% 0.5% Results Predicted Results (Line Of Best Fit) 0.0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Number Of Connected Peers Figure 7.2 Graph of protocol overheads depending on number of connected peers From the graph you can again quickly see that the results used more overhead than predicted, but not by much. The gradient of the line is also steeper than the predicted. Even with this greater gradient the overheads are still very low. At 20 peers the overheads would be 2.6%, 50 peers 6%, and at 100 13%. These numbers are a bit high and should be lowered in future implementations. It should be noted that these overheads are only for connected peers. There may be 1000s of peers on the network, but any peer may only be connected to a small number. These results however do indicate a maximum number of peers the client should keep connected. If too many are connected the client will begin to waste bandwidth, if too few the client won’t get the stream on time. 7.3 Summary This chapter has shown that the system has operated correctly, well and almost to expectations. Tests results have been promising and future improvements can be looked into. The efficiency of the protocol seems good, and even better if a correct requesting algorithm is found. The overheads were a little high, but hopefully bug fixes to the program will solve this. The next chapter will now bring together all the elements of the project and evaluate the entire project as a whole. - 55 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 8 Conclusion To complete the report this chapter will recap the aims of this project and discuss how well each aim has been achieved. In the first chapter the goals for the project were deliberate, here they will be broken down into five main objectives. After each objective will be a discussion of how well they were achieved. 8.1 Project Goals “This project aims to investigate current P2P and streaming research topics and highlight any flaws in these systems.” [Section 1.3 sentence 1] This goal has been successfully covered in the background reading chapter of the project. Different P2P architectures were researched such as Gnutella networks [see section 2.2.2], Pasty and Chord networks [see Sections 2.4.1 and 2.4.3], and more specific streaming P2P networks such as ZIGZAG [see Section 2.3.4]. Each one of these had their good and bad points discussed with some thought placed into the flaws and why they occurred. “It will also integrate previously unrelated topics of P2P and streaming into a single solution.” [Section 1.3 sentence 2] This goal has been both successfully and unsuccessfully achieved. At the time of writing it was unaware to the author that streaming and P2P had previously been combined. As such the project was unable to “integrate previously unrelated topic” because the background chapter had discovered proposals which had combined these concepts. Even with this newly discovered research, this project was able to combine P2P and streaming in a new way building on technologies such as BitTorrent [see Section 2.2.8]. “This solution will be developed by improving existing techniques whilst solving any flaws they may have” [Section 1.3 sentence 3] The finished solution did improve on existing streaming techniques and from the results found from in Chapter 7 it can be seen that with un-optimised solution it was at least 3 times more efficient than current single source streaming. The solution however wasn’t 100% successful on removing all the flaws. Scalability and Reliability issues can be found with the tracker approach; however in section 3.5 it was discussed how this could be replaced whilst building on top of another P2P network. “The developed solution must satisfy a list of requirements which will be derived and discussed in chapter 3.” [Section 1.3 sentence 4] A list of requirements was clearly laid out in section 2.2 and the rest of the design and implementation section was built around these. Each time a requirement was meet or broken it was noted in the report. Overall all the requirements were meet however some to a greater degree than others. - 56 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming “Once a suitable solution has been found, it will be scrutinized under numerous tests to find out its usefulness and tested to demonstrate how much more efficient or effective it is to current streaming solutions.” [Section 1.3 sentence 5] The purpose of Chapters 6 and 7 was to conduct this single goal. It should be easily seen that this goal has been achieved, and allowed this project to easily report its strengths and weakness. In summary all five goals have been achieved, some to a higher degree than others, but nevertheless all achieved. These five points have also aided in the final conclusion of the report by logically breaking the project into five main criteria. 8.2 Future Work No project is ever complete and this project is no exception. It has already been highlighted in the testing section that there is future work to be carried out with the piece picking algorithms. This can be in the form of mathematical analysis or empirical evidence. Gaining future results via empirical methods would be the preferred method since it has already been seen that mathematical predictions failed to accurately work in this all cases. Other future work could also include testing the protocol on a larger scale, i.e. with more than 100 hosts. It could be possible in larger scale networks that a more structured approach to the distribution should be undertaken because unforeseen problems might occur. One foreseeable problem is if smaller highly connected groups of peers were formed out of the whole network, and these smaller groups get starved of the stream, then a large number of peers would be affected. Problems such as these must be thought about and tackled. A final future work would be to improving the underlying P2P network used. The tracker is an obvious bottle neck and this has been known since the design. Work should be placed into looking how a Distributed Hash Table such as Pastry and Chord can be used to remove the need of trackers. 8.3 Summary In the years to come streaming media is something which might become more and more popular. Today we are already seeing mobile phones that can make video calls. In such low bandwidth environments as mobile networks, solutions such as this are needed. It is clear from the goals that the project has achieved what it set out to do. A successful P2P Streaming protocol was designed, which works more efficiently than current solutions. This was achieved though extensive research into the area, followed by well design system architecture, concluded by successful tests. All in all, the project has achieved what it set out to do and maybe a little extra. - 57 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 9 References - 58 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming 10 Appendix 10.1 Bitmap Test Cases Test Expected Result Result Construct, Deconstruct Cleanly deleted object As Expected  Construct(0) Throw InvalidSize Nothing  Exception Thrown F1 Construct(1) Returns OK As Expected  Construct(2^31 -1) Returns OK (but using As Expected  up >256MB of memory) Construct(-1) Throw InvalidSize Nothing  Exception Thrown F2 Following are done with a new Construct(32) getSize(); 5 bytes (32 / 8 + 1) As Expected  getArray(); 0x 00 00 00 00 As Expected  set(0, false); 0x 00 00 00 00 As Expected  get(0); False As Expected  set(0, true); 0x 80 00 00 00 As Expected  set(0, true); get(0); True As Expected  set(31, true); 0x 00 00 00 80 As Expected  set(32, true); Throw OutOfBound As Expected  Exception set(0, true); setStart(0); 0x 80 00 00 00 Start:0 As Expected  set(0, true); setStart(1); 0x 80 00 00 00 Start:1 As Expected  set(0, true); setStart(8); 0x 00 00 00 00 Start:8 As Expected  set(8, true); setStart(8); 0x 80 00 00 00 Start:8 As Expected  set(0, true); 0x 00 00 00 00 Start:31 As Expected  setStart(31); set(8, true); 0x 00 00 00 00 Start:32 As Expected  setStart(32); setStart(32); setStart(0); 0x 00 00 00 00 Start:0 Throws  Exception F3 not() 0x FF FF FF FF As Expected  not().and(0x00 00 00 00); 0x 00 00 00 00 As Expected  setStart(8); 0x 00 00 00 FF As Expected  not().and(0x00 00 00 00 Start:0); 10.2 StreamBuffer Test Cases Test Expected Result Result Construct, Deconstruct Cleanly deleted object As Expected  Construct(0,0); Throw Exception Nothing  Thrown F1 Construct(0,1); Throw Exception Nothing  Thrown F2 Construct(1,0); Throw Exception Nothing  Thrown F3 Construct(1,1); Returns OK As Expected  Construct(2^31 -1,1); Returns OK (but uses a Mallocs  lot of RAM) nearly 2GB F4 of RAM and crashes due to OutOfMemory Construct(1,2^31 -1); Returns OK (but uses a Wasn’t - 59 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming lot of RAM) Tested Following are done with a new Construct(10,10) Read(100); Reads 0 bytes As Expected  Write(0, data); Stores 10 bytes at As Expected  index 0 Write(1, data); Stores 10 bytes at As Expected  index 1 Write(0, data); Stores at index 0, but As Expected  Write(10, data); stores nothing at index 10 (since its out of range) Write(10, data); Stores at index 10 and As Expected  moves internal pointer to start at 10 Write(0, data); Reads 10 bytes As Expected  Read(100); Write(0, data); Reads 20 bytes As Expected  Write(1, data); Read(100); Write(0, data); Reads 10 bytes then As Expected  Read(100); Reads 0 bytes Read(100); Write(0, data); Reads 10 bytes then As Expected  Read(100); Reads 10 bytes Write(1, data); Read(100); Write(0, data); Reads 5 bytes, then As Expected  Read(5); Reads 5 bytes Read(5); Write(0, data); Reads 6 bytes, then As Expected  Write(1, data); Reads 6 bytes, then Read(6); Reads 8 bytes Read(6); Read(10); Write(0, data); Reads 0 bytes As Expected  Peer(1); Write(1, data); Reads 10 bytes As Expected  Peer(1); Write(0, data); Reads 10 bytes As Expected  Write(1, data); Peer(1); 10.3 PeerConnection Test Cases Test Expected Result Result Construct, Deconstruct Cleanly deleted object As Expected  Construct(ip, port) Constructs correctly As Expected  Construct(incoming SOCKET) Constructs, and sets As Expected  internal state to open Open() with invalid IP Throw PeerException As Expected  Open() with invalid Port Throw PeerException As Expected  Open() with hostname Resolve hostname and As Expected  connect Open() when already open Returns nothing As Expected  Close() when open Closes and cleans up As Expected  Close() when closed Returns nothing As Expected  Remote connection closes Throw PeerException and As Expected  set closed - 60 of 61 -
    • Andrew Brampton Peer-to-Peer Media Streaming isOpen() while open Returns true As Expected  isOpen() while closed Returns false As Expected  getPeerBitmap() before Returns empty bitmap As Expected  open getPeerBitmap() while Returns bitmap As Expected  connected getPeerBitmap() after Returns empty bitmap Returns old  connected bitmap F1 RequestPiece(valid piece) Send Request As Expected  RequestPiece(invalid Throw OutOfRange Sent  piece) Request F2 SendPacket(packet) Send Packet As Expected  10.4 Project Proposal - 61 of 61 -