Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the security and efficiency of content distribution via network coding.bak


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

On the security and efficiency of content distribution via network coding.bak

  1. 1. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 2, MARCH/APRIL 2012 211 On the Security and Efficiency of Content Distribution via Network Coding Qiming Li, John C.S. Lui, Fellow, IEEE, and Dah-Ming Chiu, Fellow, IEEE Abstract—Content distribution via network coding has received a lot of attention lately. However, direct application of network coding may be insecure. In particular, attackers can inject “bogus” data to corrupt the content distribution process so as to hinder the information dispersal or even deplete the network resource. Therefore, content verification is an important and practical issue when network coding is employed. When random linear network coding is used, it is infeasible for the source of the content to sign all the data, and hence, the traditional “hash-and-sign” methods are no longer applicable. Recently, a new on-the-fly verification technique has been proposed by Krohn et al. (IEEE S&P ’04), which employs a classical homomorphic hash function. However, this technique is difficult to be applied to network coding because of high computational and communication overhead. We explore this issue further by carefully analyzing different types of overhead, and propose methods to help reducing both the computational and communication cost, and provide provable security at the same time. Index Terms—Content distribution, security, verification, network coding. Ç1 INTRODUCTION the past few years, there has been an increasing received data Y . However, as we can see later, such methodsF OR interest on the application of network coding on file are not applicable in practical network coding-based contentdistribution. Various researchers have considered the benefit distribution schemes.of using network coding on P2P networks for file distribution It is first showed, in the seminal work by Ahlswede et al.and multimedia streaming (such as [1], [2], [3], [4], [5]), while [6], that if the nodes in the network can perform codingother researchers have considered using network coding on instead of simply forwarding information, multiple sinks inmillions of PCs around the Internet for massive distribution a multicast session can achieve their maximum network http://ieeexploreprojects.blogspot.comof new OS updates and software patches (e.g., the Avalanche flow simultaneously. This technique is referred to as networkproject from Microsoft). What we are interested in is the coding. Since then, the topic has been intensively studied,security of content distribution schemes using network including both theoretical analysis (such as [7], [8], [9]) andcoding, and how to achieve the security efficiently. practical discussions (such as [2], [10], [11], [12]). More An important issue in practical large content delivery in details on the literature can be found in Section 2.1.a fully distributed environment is how to maintain the Some classical theoretical results (such as [9]), althoughintegrity of the data, in the presence of link failures, provide important insights, would be difficult to apply intransmission errors, software and hardware faults, and even practice since they require the knowledge of the networkmalicious attackers. If malicious attackers are able to modify topology during code construction, and require the linkthe data in transmission, or inject arbitrary bogus data into the failures to follow certain predefined pattern for the codenetwork, they may be able to greatly slow down the content to be reliable. In practice, however, a content distributiondistribution, or even prevent users from getting correct data network can be very dynamic in terms of the topology,entirely. In classical content distribution scenarios, data membership, and failures.integrity can be checked using a “hash-and-sign” paradigm, Random linear network coding [13] provides a solutionwhere the source employs a collision resistant hash function to those problems by allowing each node in the network toh to compute hash values of the original data X and signs the make local decisions. In their setting, the original X ishash value hðXÞ using a digital signature scheme S with a divided into n blocks x ; x ; . . . ; x , and each node 1 2 nsigning key k. The signature Sk ðhðXÞÞ is then used to verify computes and forwards some random linear combination P p ¼ n ci xi for each of its downstream nodes, together i¼1. Q. Li is with Cryptography and Security Department, Institute for with the coefficients c ¼ hc1 ; . . . ; cn i. We call the pair ðp; cÞ a Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613. packet. When sufficient linearly independent packets are E-mail: received, a node would be able to decode the original X. It. J.C.S. Lui is with the Department of Computer Science and Engineering, is clear that data integrity is even more important in this The Chinese University of Hong Kong, Shatin, NT, Hong Kong. E-mail: setting, since, without verification, a node T could combine. D.M. Chiu is with the Department of Information Engineering, The a damaged (or maliciously modified) packet into all the Chinese University of Hong Kong, Shatin, NT, Hong Kong. packets that T generates, and hence, all its downstream E-mail: nodes would received only corrupted data.Manuscript received 14 Sept. 2008; revised 4 Sept. 2010; accepted 18 May Unfortunately, traditional “hash-and-sign” techniques2011; published online 8 June 2011. cannot be easily applied with random linear networkRecommended for acceptance by R. Sandhu.For information on obtaining reprints of this article, please send e-mail to: coding. In classical digital signature schemes, only, and reference IEEECS Log Number TDSC-2008-09-0144. sender can produce the correct signature of any randomDigital Object Identifier no. 10.1109/TDSC.2011.32. combination of data. Hence, the sender would have to 1545-5971/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society
  2. 2. 212 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 2, MARCH/APRIL 2012precompute and distribute the signatures for all possible We address the first problem by substituting the under-linear combinations, which is infeasible. lying homomorphic hash function to a recently proposed This problem of detecting malicious modifications at alternative called VSH, which is much faster [17]. This hashintermediate nodes, especially when it is infeasible for the function has an additional advantage that most of the systemsender to sign all the data being transmitted, is sometimes parameters are fixed, which do not need to be transmittedreferred to as on-the-fly Byzantine fault detection. Krohn et al. over the network. Furthermore, we analyze carefully the[14] considered the problem in the context of large content required sizes of the system parameters, and study how todistribution using rateless erasure codes (or fountain codes), choose system parameters to avoid unreasonably largeand proposed a technique using homomorphic crypto- communication overhead.graphic hash functions [15]. Hereafter, we will refer to this In Section 2, we survey previous work on network codingscheme as the KFM scheme. It is noted by Gkantsidis and and error detection techniques (Section 2.1), and giveRodriguez [10] that the same technique can be adapted in detailed descriptions of the KFM scheme (Section 2.2). Wenetwork coding-based content distribution. However, both observe certain limitations of the secure random checksumthe computational and communication overhead is high. scheme [16] in Section 2.3. A homomorphic hash function A simpler and more efficient verification method called based on VSH is given in Section 3, and a basic verificationsecure random checksum (SRM) was proposed by Gkantsidis scheme based on this new hash function is given in Section 4.and Rodriguez [16], which comes at the price of weaker We analyze how to apply the batch verification in Section that depends on the secrecy of user-specific para- A more efficient sparse variant of the random linear networkmeters. Although computationally efficient, this scheme coding is proposed in Section 6, and communication over-poses certain limitations on the distribution scenarios, as head is carefully analyzed in Section 7. We also verify ourwe will see in Section 2.3. results with experiments in Section 8. We make a few observations on the differences betweenthe scenarios using erasure code and the network coding. 2 BACKGROUNDThese differences makes the adoption of the KFM scheme[14] very challenging. 2.1 Related Work It is a well-known graph-theoretic result that the maximumObservation 1. The parameters in the KFM scheme allows capacity between a source and a sink connected through a binary coefficients. Furthermore, the weight of each network is the same as the maximum network flow f coefficient vector is a small constant. However, in random between them. When the network can be viewed as a linear network coding, it is not clear if there can be too directed acyclic graph with unit capacity edges, f is also the many zero coefficients. Furthermore, the sizes of the min-cut of the graph between the source and the sink. coefficients cannot be too small, since otherwise the However, when there is a single source and multiple sinks, network coding cannot achieve the theoretical capacity the maximum network flow f may not be achieved. A with high probability, and it would be insecure against seminal work of network coding [6] reveals that if the nodes attacks. in a network can perform coding on the information theyObservation 2. The security of the KFM scheme relies on the receive, it is possible for multiple sinks to achieve their max- size of a security parameter in the hash function (e.g., it flow bound simultaneously through the same network. This should be at least 1,024 bit long). In random linear network elegant result provides new insights into networking today coding, the size of the modulus will be the same as the size since it now becomes possible to achieve the theoretical capacity bound if one allows the network nodes on the path of the smallest data unit as well as that of the coefficients. to perform coding, instead of just the conventional tasks ofObservation 3. The parameters of the hash functions and routing and forwarding. the hash values of each data block have to be distributed Later, Li et al. [7] showed that, although the coding in advance in both settings. Also, during transmission, performed by the intermediate nodes does not need to be the coefficient vector has to be transmitted together with linear, linear network codes are indeed sufficient to achieve each combined data block. However, the binary con- the maximum theoretical capacity in acyclic synchronous stant weight vectors in the KFM scheme can be easily networks. In their settings, each node computes some linear compressed, whereas in network coding this is not the combination of the information it receives from its upstream case. nodes, and passes the results to its downstream nodes. These observations give rise to a few challenging issues However, to compute the network code (i.e., the correct linear combinations) that is to be performed by the nodes,that need to be addressed to achieve secure and efficient the topology of the network has to be known beforehand,network coding. and has to be fixed during the process of contentProblem 1. The cryptographic hash function used in [14] is distribution. Furthermore, their algorithm is exponential computationally very expensive, even in their probabilistic in the number of edges in the network. batch verification variant. This is made worse when the ´ Koetter and Medard [8], [18] also considered the problem KFM scheme is adopted for random linear network coding, of linear network coding. They improved and extended the since the random combination coefficients have to be much results by Li et al. [7], and considered the problem of link larger. failures. They found that a static linear code is sufficient to handle link failures, if the failure pattern is known before-Problem 2. The communication overhead in network coding hand. However, as mentioned by Jaggi et al. [9], the code context can be much more significant and cannot be ignored construction algorithm proposed by Koetter et al. still due to the large sizes of the parameters, hash values, and requires checking a polynomial identity with exponentially coefficient vectors. many coefficients.
  3. 3. LI ET AL.: ON THE SECURITY AND EFFICIENCY OF CONTENT DISTRIBUTION VIA NETWORK CODING 213Fig. 1. On-the-fly Byzantine fault detection [14]. Jaggi et al. [9] proposed the first centralized code 2.2 The KFM Schemeconstruction algorithm that runs in polynomial time in the 2.2.1 Basic Verification Schemenumber of edges, the number of sinks, and the minimumsize of the min-cut. They also noted that, although the The overall picture of the on-the-fly detection techniqueresults of Edmonds [19] show that network coding does not presented in [14] (which we refer to as the KFM scheme) isimprove the achievable transmission rate when all nodes illustrated in Fig. 1.except the source are sinks, finding the optimal multicast In this scheme, the content X is divided intorate without coding is NP-hard. They also showed that if n blocks x1 ; . . . ; xn , and each block xi is further dividedthere are some nodes that are neither the source nor the into m subblocks xi;1 ; . . . ; xi;m , where each xi;j can besinks, then multicast with coding can achieve a rate that is represented by an element in the multiplicative group Z Ã Zpðlog jV jÞ times the optimal rate without coding, where jV j for some large prime the number of nodes in the network. It is also shown in [9] A hash function H is then applied on each block to obtainthat their method of code construction can handle link the hash values h1 ; . . . ; hn . In particular, the hash functionfailures, provided that the failure pattern is known a priori. uses m generators g1 ; . . . ; gm 2 Z Ã , and the hash value hi of Zp Random network coding was proposed by Ho et al. [13] the ith block is computed as h ¼ Qm gxi;j mod p. i j¼1 jas a way to ensure the reliability of the network in a Clearly, the hash function H is homomorphic in thedistributed setting where the nodes do not know the sense that for any two blocks x and x , it holds that i jnetwork topology, which could change over time. In their Hðx ÞHðx Þ ¼ Hðx þ x Þ. These hash values are distribu- i j i jsetting, each node would perform a random linear network ted to all the nodes reliably in advance. It is suggested incoding, and the probability of successful recovery at the [14] that the same technique can be used recursively on thesinks can be tightly bounded. Chou et al. [2] proposed a hash values until the final hash value is small enough toscheme for content distribution based on random networkcoding in a practical setting, and showed that it can achieve be distributed without coding. After receiving a codednearly optimal rate using simulations. Recently, Gkantsidis block x, which is a linear combination of the originaland Rodriguez [3] proposed another scheme for large-scale n blocks with coefficients C ¼ hc1 ; . . . ; cn i, a node will becontent distribution based on random network coding. able to verify the integrity of x using x, C, and the hashThey show by simulation that when applied to P2P overlay values h1 ; . . . ; hn , making use of the homomorphic prop-networks, using network coding can be 20-30 percent better erty of H. In particular, the node checks if the followingthan server side coding and two to three times better than holds:uncoded forwarding, in terms of download time. Y c n With the recent popularity of P2P networks [20], [21], HðxÞ ¼ hi i mod p: ð1Þresearchers are beginning to consider the problem of on-the- i¼1fly Byzantine fault detection in content distribution usingrandom network coding. The authors in [10] noted that the 2.2.2 Limitationsverification techniques proposed by Krohn et al. [14] can There are two inherent limitations in the above scheme. First,be employed to protect the integrity of the data without the it is computationally expensive to compute the underlyingknowledge of the entire content. The verification techniques hash function H. Second, all parameters of the scheme,were originally developed for content distribution usingrateless erasure codes and were based on homomorphic including all the hash values, must be distributed incryptographic hash functions [15]. advance, even to verify a single data packet. Another simple and efficient on-the-fly verification In [14], the first problem of expensive computation isscheme was proposed by Gkantsidis et al. [12]. Although dealt with by using two techniques. On one hand, theit may be suitable for certain application scenarios, there are parameters are chosen such that all coefficients in C are verysome limitations when it is put under a generic setting, as small, and many of them are actually zeros. On the otherwe will see in Section 2.3. hand, verification of multiple data packets can be done in
  4. 4. 214 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 2, MARCH/APRIL 2012batches to reduce the overall computational cost. The second TABLE 1problem, however, is tackled by choosing the parameters Symbols Used in This Papercarefully such that the overhead due to the hash values isless than one percent, which is acceptable in many practicalscenarios. Nevertheless, when the original data are rela-tively large (say, 1 gigabytes), the start-up delay caused bythe transmission of the hash values may not be tolerable,especially in dynamic networks where nodes frequently joinand leave.2.2.3 ChallengesIt is worth to note that the KFM scheme is intended to beapplied on rateless erasure codes, where only the source ofthe content performs the coding, which is in contrast withthe idea of network coding, where all intermediate nodescan participate in the coding. The difference in thesesettings leads to challenging problems when adopting thisscheme in random network coding. Basically, to allow intermediate nodes freely combine leaving and joining the network. To some extent, the use ofdata blocks, the combination coefficients has to be chosen such checksums weakens the potential advantages one couldfrom the same group as the subblocks (which is the smallest expect from a distributed content distribution scheme.unit in combination). Hence, for a randomly combinedblock, the size of the coefficients will be njpj (where j Á jdenotes the bit length). Clearly, jpj cannot be small (e.g., at 3 A HOMOMORPHIC HASH FUNCTION BASED ONleast 1,024), since otherwise the hash function can be easily VSHbroken. Furthermore, if the size of the content is large, n The proposed basic scheme is based on the nontrapdoorcannot be too small either, since otherwise m would be too variant (VSH-DL) of the Very Smooth Hash (VSH) functionlarge to make the distribution of parameters g1 ; . . . ; gm due to Contini et al. [17]. The rationale behind VSH and itsprohibitively expensive. variants is that using smaller primes as group generators As a result, this not only introduces extra communica- would greatly improve the computational efficiency of http://ieeexploreprojects.blogspot.comtion overhead due to either the hash parameters or the hash functions that involve many exponentiations. Thecoefficients, but also makes the computation of (1) (hence, VSH functions, however, do not have the homorphicthe verification) much more expensive. property that would be required by the on-the-fly verification2.3 The Secure Random Checksum Scheme process.An alternative to the expensive cryptographic hash function In this paper, we propose a homomorphic hash functionas used in [14] is proposed in [16], which is called the SRC. based on the same idea of VSH-DL, which we will describeTheir main idea is as following. Before the actual down- in detail in Section 3.1. The homomorphic property isloading commences, each node retrieves a checksum for each obtained by rearranging the order of the bits in the inputblock of data from the source via a secret channel. Each message blocks before applying VSH-DL. For readability,checksum is computed as a random combination of all the all symbols used are presented in Table 1.subblocks in a block, and the coefficients are different for 3.1 A Homomorphic VSH-DLeach node. These checksums are also homomorphic, so thatthey can be used to check the integrity of any received packet Let p be a large strong prime, such that there is anotherin a way similar to homomorphic hash functions. Since only large prime q divides p À 1. That is, there is a positivelinear combinations are involved in the computation of these integer such that p ¼ q þ 1. As a result, there exist achecksums, the verification can be very efficient. multiplicative subgroup G in Z Ã with order q. Furthermore, Zp We note that in this scheme, every node needs to for any x 2 Z p , let y ¼ x mod p, if y 6¼ 1 mod p, y must be in Zdownload a set of distinct checksums directly from the G. For convenience, let jpj ¼ and jqj ¼ , where j Á j denotessource. This creates a centralized downloading scenario with the bit length.a smaller content size (which is the size of the checksums), Let p1 ; . . . ; pm be m prime numbers, such that m c forwhich may lead to two problems. First, this poses limits on some constant c. In other words, m is bounded by somethe scalability of the scheme, since the source could be polynomial in . In practice, we can choose those primes to overwhelmed by the requests to download checksums. be the m smallest prime numbers, such that pi 6¼ 1 mod p.Whereas in the case of homomorphic hashes, although the That is, we can choose p1 ¼ 2, p2 ¼ 3, and so on, and skiphash values have larger sizes, it is not necessary to download those primes whose order is not q. When q is much largerthem directly from the source but they can be instead than (e.g., ¼ 2), the probability that a random smallobtained from peers, and no additional secure channel is prime pi satisfies the condition p 6¼ 1 mod p is high. irequired. Second, the source is required to be online until all Assume that a message is a vector of the form: x ¼the nodes have received checksums from it, which makes it ðx1 ; . . . ; xm Þ where xi 2 Z q for 1 i m. The hash of x is Zdifficult for dynamic networks where nodes are frequently computed as
  5. 5. LI ET AL.: ON THE SECURITY AND EFFICIENCY OF CONTENT DISTRIBUTION VIA NETWORK CODING 215 Y m HðxÞ ¼ pxi mod p: ð2Þ Following the result in [17], we can see that the hash i i¼1 function H is collision resistant if the GVSDL problem is It is worth to note that the original VSH-DL function is computationally hard.equivalently computing the same function with alpha ¼ 2, Lemma 4. The hash function H is collision resistant if GVSDL isbut explicitly using the parallel exponentiation algorithm computationally hard w.r.t. ¼ jqj.due to Bellare et al. [22] and used in [14].1 Proof. Suppose on the contrary that there exists an attacker Homomorphic property. For any two messages x ¼ A that can compute a collision of H with a probabilityðx1 ; . . . ; xm Þ and y ¼ ðy1 ; . . . ; ym Þ, it is not difficult to see that that is not negligible, we show that we can build another HðxÞHðyÞ ¼ Hðx þ yÞ: ð3Þ algorithm B that makes use of A to solve the GVSDL problem also with a probability that is not negligible.In other words, the hash function H is ðþ; ÂÞ-homomorphic. In particular, given ðp; qÞ, which is an instance of a GVSDL problem, and an attacker A, we use A to find3.2 Security of the Hash Function two messages x ¼ ðx1 ; . . . ; xm Þ and y ¼ ðy1 ; . . . ; ym Þ suchThe security of H is defined in terms of the difficulty in that x 6¼ y but HðxÞ ¼ HðyÞ. In this case, according to (2),finding collisions. To precisely define the security, we we havefollow the commonly used notions of negligible functionsand computational feasibility in cryptography. Y m Y m pxi ¼ i pyi i mod p:Definition 1. A function fðnÞ is called negligible w.r.t. n if for i¼1 i¼1 any positive polynomial polyðÁÞ, for all large enough n, it holds 4 Let zi ¼ xi À yi mod q; the above is equivalent to that fðnÞ 1=polyðnÞ. Y m pzi ¼ 1 mod p: i Furthermore, we are mainly interested in the collision i¼1resistance of the hash function. This is exactly a solution to the GVSDL problem instance.Definition 2. A hash function H is collision resistant if for any Note that at least some zi s are nonzero because x 6¼ y. If polynomial-time probabilistic algorithm A, the probability A succeeds with probability pa , then we can also find the Pr½Aðp; qÞ ¼ ðx; yÞs:t:HðxÞ ¼ Hðyފ is negligible w.r.t. jqj. solution zi s with probability pa . u t In other words, H is collision resistant if it is computa-tionally infeasible to find two messages x and y thatproduce the same hash value. 4 THE BASIC INTEGRITY VERIFICATION SCHEME Next, we define a generalized version of the VSDL 4.1 The Basic Schemeproblem as in [17]. Our proposed scheme consists of two algorithms, namely, theDefinition 3. (GVSDL). Let p be a large prime, and let q be a encoding algorithm where the original data are prepared for large prime such that p ¼ q þ 1 for some positive integer . distribution and the verification algorithm, which is used by Let p1 ; . . . ; pm be the m smallest prime numbers such that individual nodes to verify the integrity of the received data. p 6¼ 1 mod p for 1 i m, where m c for some constant i c. GVSDL is the following problem: given the aforementioned 4.1.1 Encoding parameters, compute e1 ; . . . ; em 2 Z q , such that at least one of Z Let the parameters p; q; m and p1 ; . . . ; pm be chosen as in the ei s is nonzero, and Section 3.1. Given any binary string X, let n be the smallest positive integer such that jXj mnð À 1Þ À 1. Assume that Y m pei ¼ 1 mod p: ð4Þ n polyðÞ for some positive polynomial poly. We also i i¼1 assume that X is compressed, such that the bits are random. In this way, we can always pad the original X properly When ¼ 2, the above definition reduces to the VSDL (e.g., with a one followed by zeros) such that the result canproblem. We say that an algorithm A solves a given GVSDL be divided into small pieces of À 1 bits each. In otherproblem instance ðp; qÞ if Aðp; qÞ ¼ ðe1 ; . . . ; em Þ such that words, we can always encode the data into the formsome ei is nonzero, and (4) holds. Note that p1 ; . . . ; pm are X ¼ ðx1 ; . . . ; xn Þ, where xi ¼ ðxi;1 ; . . . ; xi;m ÞT and each xi;j isimplicitly specified once p and q are given, and they can be of length À 1, and can be considered as an element in Z q Zcomputed efficiently. Furthermore, we say that GVSDL is for all 1 i n and 1 j m.computationally hard if for any polynomial-time probabilistic We will call xi as the ith block, and each xi;j as thealgorithm A, the probability that A solves a random GVSDL jth subblock of the ith block. Now, given X, the encoderproblem instance ðp; qÞ is negligible w.r.t. ¼ jqj. The computesprobability is taken over the random choices of p; q and Y m xi;jthe internal coin tosses made by A. hi ¼ Hðxi Þ ¼ pj mod p ð5Þ j¼1 1. There is an off-by-one error in the algorithm due to Bellare et al. [22],which is corrected in [14]. for each 1 i n.
  6. 6. 216 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 2, MARCH/APRIL 20124.1.2 Basic Verification Algorithm Clearly, algorithm B runs in polynomial time. When PDuring verification, each network node is given a packet A succeeds in Step 3, we know that y 6¼ n cj xj ; hence j¼1hx; ci and system parameters. In the case where this packet at least one of the ei s is nonzero. Therefore, B succeedsis not tampered with, c ¼ ðc1 ; . . . ; cn Þ are the coefficients if and only if A succeeds. In other words, B succeedswhere each ci 2 Z q , x is the linear combination x ¼ Z with the same nonnegligible probability P , hencePn creating a contradiction. u t i¼1 b ci xi mod Z q . Z Each node can verify the integrity of the packet asfollowing: 5 BATCH VERIFICATION 1. Compute the hash value H1 ¼ HðxÞ. 5.1 The Baseline Batch Verification Scheme Q 2. Compute H2 ¼ n hci mod p. i¼1 i To reduce the computational cost of the verification, a batch 3. Verify that H1 ¼ H2 . of packets can be verified at the same time. In particular, It is worth to note that once an intermediate node has after a node has received b packets ððy1 ; c1 Þ; ðy2 ; c2 Þ;received the parameters p; q, and m, it will be able to compute . . . ; ðyb ; cb ÞÞ (not necessarily from the same source), the and p1 ; . . . ; pm locally. Therefore, those prime bases do not node can verify all the packets as follows:need to be distributed. The hash values, however, still need 1. Randomly choose b numbers r1 ; . . . ; rb 2 Z q . P Zto be distributed. Nevertheless, we will see that in any 2. Compute w ¼ b ri yi mod q.reasonable setting the communication overhead is low. P i¼1 3. Compute v ¼ b ri ci mod q. i¼14.2 Security Analysis 4. Verify the integrity of the packet ðw; vÞ using theIntuitively, an attack is considered as successful if the basic integrity verification scheme.attacker, with the full knowledge of the content being Due to the homomorphic property of the hash function,distributed and all the public parameters, can generate a we can see that if the verification in Step 4 fails, then atpacket ðp; cÞ such that p is not a linear combination of the least one of the packets is corrupted. However, if the batchoriginal data specified by c but the packet still passes the of packets pass the verification, it is still possible thatverification. Hence, we have the following definitions of some packets are corrupted but could not be detected byintegrity attackers and the security. the verification algorithm. Hence, we need to analyze theDefinition 5. An integrity attacker A is a probabilistic security more carefully. polynomial-time algorithm, such that given parameters p; q; 5.2 Security Analysis m; p1 ; . . . ; pm and the encoded content X ¼ ðx1 ; . . . ; xn Þ, the We first extend the definition of an integrity attacker to attacker A finds y and c ¼ ðc1 ; . . . ; cn Þ so that for work on a batch of packets. 4 P x ¼ n ci xi , we have y 6¼ x yet HðxÞ ¼ HðyÞ. i¼1 Definition 8. A successful batch integrity attacker A is aDefinition 6. The basic scheme is secure if any integrity attacker probabilistic polynomial-time algorithm, such that given that can succeed with a probability that is not negligible w.r.t. parameters p; q; m; p1 ; . . . ; pm and the encoded content does not exist. X ¼ ðx1 ; . . . ; xn Þ, the attacker A computes b packets ððy1 ; c1 Þ; ðy2 ; c2 Þ; . . . ; ðyb ; cb ÞÞ, where at least for some yi P We show that the basic scheme is secure, since otherwise and ci ¼ ðc1 ; . . . ; cn Þ, we have yi 6¼ n ci xi , but the packets i¼1we can solve the GVSDL problem. pass the batch verification algorithm with a probability that isTheorem 7. The basic scheme is secure if GVSDL is not negligible. computationally hard w.r.t. . We can show that to come up with a batch of packets thatProof. Essentially, we need to show that if there exists an pass the above verification is not easier than breaking the integrity attacker A that succeeds with probability P that basic verification scheme itself. is not negligible in , we can construct a probabilistic Corollary 9. The batch verification scheme is secure (i.e., no algorithm B that solves the GVSDL problem with the successful batch integrity attacker exists) if the basic scheme same probability P , which contradicts with the assump- is secure. tion that GVSDL is hard. In particular, we construct an Proof. Suppose a successful batch integrity attacker A algorithm B as follows: given input p; q; m; p1 ; . . . ; pm as exists. We construct another attack algorithm B as in Definition 3, the algorithm B does the following steps: following. Given parameters b; p; q; m; p1 ; . . . ; pm and the 1. Randomly select n 2 ½2; polyðފ, for some positive encoded content X ¼ ðx1 ; . . . ; xn Þ, the attacker B per- polynomial poly. forms the following steps: 2. Randomly select xi;j 2 Z q for all 1 i m and Z 1. Invoke A to obtain b packets ððy1 ; c1 Þ; . . . ; ðyb ; cb ÞÞ. 4 1 j n. Denote xi ¼ ðxi;1 ; . . . ; xi;m Þ and X ¼ 4 4 P Let Y ¼ fðyi ; ci Þ j yi 6¼ n ci;j xj g, where we j¼1 ðx1 ; . . . ; xn Þ. denote ci ¼ ðci;1 ; . . . ; ci;n Þ. That is, Y is the set of 3. Invoke A with p; q; m; p1 ; . . . ; pm and X as inputs. corrupt packets. If A fails, halt and declare failure. If A succeeds, let 2. For every packet ðyi ; ci Þ in Y , check if it passes the y ¼ ðy1 ; . . . ; ym Þ and c ¼ ðc1 ; . . . ; cn ÞP the output. be basic integrity verification. If a packet ðy0 ; c0 Þ in 4. Output e1 ; . . . ; em , where ei ¼ yi À n xi;j cj mod j¼1 Y passes the verification, output ðy0 ; c0 Þ and halt. q for 1 i m. Otherwise, do the following:
  7. 7. LI ET AL.: ON THE SECURITY AND EFFICIENCY OF CONTENT DISTRIBUTION VIA NETWORK CODING 217 3. Randomly choose b coefficients r1 ; . . . ; rb 2 Z q . P Z the components of w, remove some rows in Y and solve the 4. Compute w ¼ b ri yi mod q. i¼1 reduced system to find r, and finally compute the rest of the P 5. Compute v ¼ b ri ci mod q. i¼1 components in w. If b is not too small compared with m, this 6. Output packet ðw; vÞ. method could save a large portion of the computation cost. If the algorithm halts at Step 2, clearly it has already A similar method can also be applied on the computation successfully attacked the basic verification scheme. of H2 to make v very small. This can be useful when b is not Otherwise, by the definition of the attacker A, the packet small compared with n. However, clearly it cannot be used ðw; vÞ passes the basic verification with a probability pa simultaneously with the first method. The actual method to that is not negligible. apply would be determined by the choice of the parameters If ðw; vÞ is indeed corrupted (i.e., w is not the linear m, n, and b. combination of X with coefficients in v), such a packet It is worth to note that the effects of reducing w and v are would be considered as a successful attack on the basic not the same. This is because pj s are in general much integrity verification scheme, and the attacker B would smaller than hj s; hence, the computation of H1 is much be successful. Otherwise, when ðw; vÞ appears to be not more efficient than H2 when m ¼ n. As a result, the overall corrupted (e.g., when the coefficients ri s chosen for the computation cost has to be considered when deciding corrupted packets happen to be 0), the attack would fail. which function is to be optimized. However, since the coefficients ri s are randomly chosen, the probability that the second case happens is exponen- 6 SPARSE RANDOM LINEAR NETWORK CODING tially small. Therefore, the algorithm B can successfully attack the The computation overhead involved in the content basic verification scheme with probability pb that is not distribution consists of two parts. The first part is the cost negligible (since pb À pa is negligible). Hence, the due to the verification of the packets, and the second part corollary follows. u t is the cost due to the need to compute random combina- tions of the data blocks. The preceding sections of this When the batch verification technique is used, it is clear paper focus on the first part of the cost, which can bethat the computational cost can be reduced roughly by a reduced through the use of more efficient hash functionsfactor of b, since we perform verification on b packets at a and batch verification techniques as we have discussed.time, and the additional cost introduced by the random Nevertheless, the second part of the cost also plays a verylinear combination of the packets is not significant important role in practice, especially when the content iscompared with the cost of the basic verification on the large (e.g., in the order of gigabytes), and it has a http://ieeexploreprojects.blogspot.comcombined packet. Nevertheless, it should be noted that b significant impact on the choice of parameters.cannot be too large for typical applications since it will In some previous work (such as [2], [3]), it is proposed tointroduce additional delay in the content distribution divide the content to be distributed into smaller trunksprocess, since a node needs to wait until b packets to arrive (sometimes referred to as generations), and random linearbefore the verification can be done. network coding is applied to each trunk of content independently. Although this method works in certain5.3 An Enhanced Scheme application scenarios, it does not address the problemThe cost of verification in the batch verification can be directly but instead avoids high computation overhead byfurther reduced. Recall that to verify the packet ðw; vÞ using applying random linear network coding to smaller problemthe basic verification scheme, we must compute H1 and H2 instances. Hence, this strategy may lose certain benefits fromas below and check if they are the same: network coding. For example, when a node sends data to its Y w m Y v n downstream nodes, it has to decide which trunk to send. If H1 ¼ pj j ; H2 ¼ hj j ; ð6Þ the algorithm to make such decisions is not designed j¼1 j¼1 properly, it may result in a situation where a certain trunk cannot be reconstructed after a few key nodes have left thewhere w ¼ ðw1 ; . . . ; wm Þ, v ¼ ðv1 ; . . . ; vn Þ, and hj is the hash network.value of the jth block. Intuitively, if we can make the Here, we propose a simple yet powerful alternative toexponents in the computation as small as possible, the avoid high computation cost when computing the randomcomputation cost can be reduced. combinations. We will refer to this method as Sparse Random To see how we can make wj s small, let us consider the Linear Network Coding. The idea is that, instead of comput-batch verification process more carefully in matrix form. Let ing a random combination of all the n data blocks, we canus denote a batch of packets by ðY; CÞ, where Y ¼ instead randomly select only of them and compute aðy1 ; . . . ; yb Þ is the actual data (when not corrupted) and C ¼ random combination of only those blocks. More precisely,ðc1 ; . . . ; cb Þ are the coefficient vectors. During the batch T when a node A needs to send a packet ðx; cÞ to itsverification, another random vector r ¼ ðr1 ; . . . ; rb Þ is used, downstream node, it performs the following steps:where w ¼ Yr and v ¼ Cr. Hence, if we instead choose w(or part of w) and find the corresponding r, we may be able 6.1 Sparse Random Linear Network Codingto greatly reduce the computation cost. In particular, ifb ! m, we can randomly choose w and find a solution to 1. Randomly choose packets from the randomw ¼ Yr provided that the rank of Y is m. Even if the rank of combinations received by A so far. Let these packetsY is not m or b m, we can still choose the values of some of be ðx1 ; c1 Þ; . . . ; ðx ; c Þ.
  8. 8. 218 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 2, MARCH/APRIL 2012 2. Randomly choose r1 ; . . . ; r 2 Z q . Z 7.3 Coefficients 3. Compute packet ðx; cÞ as Recall that each packet transmitted over the network consists of a pair ðx; cÞ; hence, the overhead due to the X X need to transmit the coefficient vector c has to be examined x¼ r i xi ; c¼ ri ci : i¼1 i¼1 closely. Since each coefficient is in the same finite field as each subblock of the content, the size of one coefficient is . Also, we require the source node to be more powerful Hence, the total size of the coefficient vector is n.than other nodes and still send random combinations of all Considering that the size of each data block is m, then blocks. It is clear that each packet being sent over the overhead ratio due to coefficient vectors is would still be quite random, and allow high Since this part of the overhead is reoccurring in the senseprobability of reconstruction at the receivers. We confirm that it will happen each time some data are transmitted, it isthis intuition through experiments in Section 8.2. desirable to make the ratio n=m as small as possible. Hence, It is worth to note that the probability of successful as we mentioned earlier, n cannot be too large. For instance, ifdelivery is very high even with small constant . Therefore, the size of the data is 100 megabytes, ¼ 800 (say, withthe computation overhead due to the computation of jpj ¼ ¼ 1;024), and we want to make n=m 0:01, we mustrandom combinations can be made independent of the have n 210 =10 % 100. When the equality holds, m ¼ 100n ¼number of blocks, which greatly relaxes the constraints that 10 Â 210 % 10;000 and each block is of size 1 megabyte. If theneed to be considered for practical systems. original data are of size 1 gigabytes, with the same , roughly In addition, the sparse coding variant is just as secure as we can have at most 320 blocks of 3.2 megabytes each.the basic scheme. The proof of security of the basic scheme For dynamic networks where nodes frequently join andcan be applied for sparse coding without modifications, leave the network, or in scenarios where nodes have slowsince the proof does not require the coefficients to be of any connections to each other, it may be desirable to have dataspecial properties. blocks as small as possible. One way to achieve that is to reduce , at the cost of reduced security. For example, when7 COMMUNICATION OVERHEAD ¼ 400 (say, with jpj ¼ ¼ 512), and n=m 0:01, we have n 144 for 100 megabytes data, and each block is at leastThe communication overhead of the content delivery about 700 kilobytes.scheme involves three parts, namely, the cost of distributingthe parameters, the hash values, and the random coeffi- 7.4 Trade-Offscients used in the combinations. Besides trade-offs within the two categories of overhead, off between the communication and we can actually trade7.1 Parameters computation overhead.The first part is the cost of distributing the parameters for The main idea is that we can divide the original contentthe delivery. Here, we are mainly concerned about the X into smaller trunks and reuse the coefficient vector forparameters related to the hash function, which includes n different trunks. This is similar to some previous work (e.g.,the number of blocks, m the number of subblocks per block, [2], [3]), but the difference is that we divide X “horizontally”and primes p and q. Since the algorithm to find the small (versus “vertically” as in previous work) if we view theprime bases pi s is public and deterministic, all nodes in the original content in the matrix form. In particular, each trunknetwork can compute those pi s locally after receiving the contains mt rows of X, and during each transmission from aabove parameters. Note that the first part of the overhead is node to one of its downstream nodes, all the m=mt trunksindependent of the content size, and can be regarded as share the same coefficient vectors. As a result, we can sendconstant. On the other hand, if the scheme proposed in [14] only one coefficient vector for m=mt used instead, this part of the overhead would be at least In this way, we can remove the dependency between mtthe same size of a data block, which may be significant, and n to some extent, so that the computation overhead cansince the number of blocks n cannot be too large (as we will be reduced when we apply the enhanced batch verificationshow later in Section 7.3). scheme in Section 5.3, since now mt and b can be made closer. The price of doing this is twofold. First, we need to7.2 Hash Values distribute hash values for each trunk independently. As aThe second part is the cost of distributing the hash values result, the one-time communication cost due to the hashof the data blocks before the actual delivery of the data. values increases by a factor of m=mt . On the other hand,Since each hash value is an element in Z p , the total size of during the verification, we need to compute H2 as in (6) Zthe hash values is njpj ¼ n. This part of the cost is exactly m=mt times for a batch, instead of only one time. Asproportional to the number of data blocks, and it is a one- we analyzed earlier, when we require small reoccurringtime cost for any node in the network for the entire content communication overhead, n=m should be small. Hence,distribution session. The ratio of the cost over the total size such an increase may be offset by the effect of the enhancedof content is =ðmÞ, which is typically very small. For batch verification.instance, ¼ jpj is typically at most 2 kilobytes, and m ¼mjqj is roughly the size of one data block, which is typicallyat least 32 kilobytes. In this case, the overhead ratio is about 8 EXPERIMENTS0.78 percent. When the data blocks are of a few megabytes 8.1 Computation Overheadin size, as suggested in some of the previous work, such As we mentioned in Section 6, the computation costoverhead would be quite insignificant. involves both the computation of random combinations
  9. 9. LI ET AL.: ON THE SECURITY AND EFFICIENCY OF CONTENT DISTRIBUTION VIA NETWORK CODING 219Fig. 2. Computation efficiency of H1 . Fig. 3. Computation efficiency of H2 .and the hash values. When sparse random linear coding is batch verification is at least 30 kilobytes per second. Withapplied, for each combined block we only need to compute batch verification and b ¼ 20, the throughput is increasedthe combination of blocks, which makes the computation to 600 kilobytes per second, and if b ¼ 100, the throughputof combinations much more efficient than that of the hash of computing H1 alone can reach about 3 megabytes pervalues. Hence, we only analyze the computation cost of the second. With precomputations, the throughput can behash function. further improved by a small factor. Since we obtain the efficiency by using small prime bases If we use smaller security parameters and , the overallin the hash function, the computation of H1 is much more throughput can be increased. It is interesting to see thatefficient than H2 as in (6), when m ¼ n. However, in typical reducing from 1,024 to 768 does not significantly increasecases, m may be much larger than n (say, n 0:01m); hence, the throughput, but further reduction to 512 shows more thanthe overall computation cost of H1 could be more than that 30 percent of speedup (from 30 to 40 kilobytes per second).of H2 . We further evaluate the computation efficiency of H2 during verification. The results are shown in Fig. 3. We can We implement the proposed hash function and conductexperiments to evaluate the efficiency of the computation of easily see that the time it takes to compute H2 increasesH1 and H2 with various parameter combinations. We use linearly with the number of blocks (i.e., n). Furthermore,GNU C/C++ compiler with open source Crypto++ library, the efficiency of computing H2 is quite predictable in theand the experiments are done on a laptop computer with an sense that when we increase the parameters (n, , or ), theIntel Core 2 Duo T7300 CPU. Only one of the two cores is increment of the time consumption is proportional to nutilized during the experiments. and the difference in (or , since we have made = In Fig. 2, we show the throughput of the computation constant in our experiments).of H1 , which is computed as the size of a data block Interestingly, Fig. 3 can also be used as a reference to thedivided by the average time it takes to compute a hash computation efficiency of the original hash functionvalue for the block. The values are taken as the average of proposed in [14]. The throughput under different security25 random instances. In general, we can see that the parameters is the gradient of different curves in the figure.throughput increases when the number of subblocks per For example, for jpj ¼ 512 and jqj ¼ 400, when n ¼ 500, theblock (i.e., m) is increased. Nevertheless, such increment time it takes to compute H2 is about two seconds. Thiswill not be very obvious after m exceeds a certain value. translates to a throughput of about 12.5 kilobytes per We can also see that even with relatively small values of second, which is much lower than 40 kilobytes per secondand (say, ¼ 512 and ¼ 400), the computation of H1 as in our proposed scheme. For jpj ¼ 1024 and jqj ¼ 800,cannot be very efficient (about 40 kilobytes per second) similar calculations show that the throughput is greatly reduced to about 5 kilobytes per second, whereas in ourcompared with common hash functions such as SHA-1. With scheme the throughput is only reduced to 30 kilobytes percarefully designed precomputation methods (such as that in second. Hence, we can see that our scheme has the[14]), the throughput can be increased by a small constant advantage that the efficiency is not reduced as much whenfactor r at the price of a storage requirement that is 2r times the parameters are increased for stronger security. Thethat without precomputation. Furthermore, by performing computational advantage of our scheme is mainly due tobatch verification, and ignoring the cost to compute the the use of deterministically chosen small primes as thelinear combination during batch verification, the throughput bases for exponentiations, which is the rationale behind thecan be increased by a factor of b, which is the number of design of the VSH scheme [17].blocks in a batch. However, in typical applications, it is not From Figs. 2 and 3, we can compute the final throughputdesirable to have a too large b, since otherwise a node needs of the hash verification with given parameters. For example,to wait for too long to receive a batch. using numerical examples in Section 7.3, if the original From these parameters, we can compute the through- content size is 100 megabytes, m ¼ 10 Â 210 , n ¼ 100,put from the graph. For example, when we choose ¼ jpj ¼ ¼ 1;024, and jqj ¼ ¼ 800, the throughput of com-1;024 and ¼ 800, and m ! 200, the throughput without puting H1 is at least 30 kilobytes per second, and the time it
  10. 10. 220 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 2, MARCH/APRIL 2012 TABLE 2 9 CONCLUSIONS Reconstruction Probabilities Researchers have shown successful application of network coding in wireless networks [23], [24] to improve system throughput, or P2P networks to improve overall system efficiency. In this paper, we investigate the security and efficiency issues in large content distribution based on network coding. We consider the problem of on-the-fly verification of the integrity of the data in transit. Although a previous scheme based on homomorphic hash functions is applicable, it was mainly designed for server side coding only, and will betakes to compute H2 is about 2 seconds. Hence, the overall much less efficient when it is applied on random networktime it takes to verify a packet is roughly 36 seconds, which coding. We propose a new on-the-fly verification schemetranslates to an overall throughput of about 28 kilobytes per based on a faster homomorphic hash function, and provedsecond. If b ¼ 20, this throughput would be increased by a its security.factor of 20, which gives 560 kilobytes per second. For the We also consider the computation and communicationsame and , when the size of the original data is 1 gigabytes cost incurred during the content distribution process. Wewith n ¼ 320 and m ¼ 33;555 (which gives a block size of identify various sources of the cost, and investigate ways toabout 3.2 megabytes), similar calculation shows that the eliminate or reduce the cost. In particular, we propose aoverall throughput without batch verification is also about sparse variant of the classical random linear network28 kilobytes per second. coding, where only a small constant number of blocks are combined each time. Furthermore, we discuss some possible8.2 Effectiveness of Sparse Random Linear enhancements under certain conditions of the parameters, Network Coding and ways to trade off among different cost.To evaluate the effectiveness of the sparse random linear Experiments are conducted to examine the efficiency ofnetwork coding as described in Section 6, we conduct the the proposed hash function, as well as the effectiveness of thefollowing experiments: proposed sparse random linear network coding. The results First, we build a random directed graph with N nodes, show that the new hash function is able to achieve reasonablewhere we select a random root node R. We randomly select speed, and the sparse variant performs just as well as thethe edges such that each node has at least k incoming edges. random network coding using typical parameters. http://ieeexploreprojects.blogspot.comIn this way, we can be sure that the min-cut from R to anyother node is at least k. Furthermore, this is similar to bit- ACKNOWLEDGMENTStorrent-like peer-to-peer networks, where each peer ran- The work of John C.S. Lui was supported in part by thedomly selects a number of nodes as its upstream nodes. RGC Project 415309. Next, we assume that R has n blocks of data to bedistributed to all other nodes in the network. The distributionis done one step at a time. In each step, for all the nodes that REFERENCEShave received data from all of their upstream nodes generate [1] S. Acedanski, S. Deb, M. Medard, and R. Koetter, “How Good Is Random Linear Coding Based Distributed Networkedtheir own combinations and deliver them to their down- Storage,” Proc. Workshop Network Coding, Theory and Applications,stream nodes. This process is repeated until no further Apr. is possible. [2] P.A. Chou, Y. Wu, and K. Jain, “Practical Network Coding,” Proc. Allerton Conf. Comm., Control, and Computing, Oct. 2003. Finally, we examine the data received by each node, and [3] C. Gkantsidis and P.R. Rodriguez, “Network Coding for Largedetermine if it is sufficient for the node to reconstruct the Scale Content Distribution,” Proc. IEEE INFOCOM, pp. 2235-2245,original data. The effectiveness of the coding scheme is 2005. [4] M. Wang, Z. Li, and B. Li, “A High-Throughput Overlay Multicastmeasured by the percentage of receiving nodes that can Infrastructure with Network Coding,” Proc. Int’l Workshop Qualitysuccessfully reconstruct the data at the end of the of Service (IWQoS), 2005.experiments. [5] Y. Zhu, B. Li, and J. Guo, “Multicast with Network Coding in Application-Layer Overlay Networks,” IEEE J. Selected Areas in In our experiments, we choose n ¼ 120, and vary k and N Comm., vol. 22, no. 1, pp. 107-120, Jan. examine the reconstruction probability. We choose q ¼ 251, [6] R. Ahlswede, N. Cai, S.-Y.R. Li, and R.W. Yeung, “Networkwhich is relatively small. Generally speaking, a larger q Information Flow,” IEEE Trans. Information Theory, vol. 46, no. 4, pp. 1204-1216, July 2000.would give larger reconstruction probability. However, as we [7] S.R. Li, R.W. Yeung, and N. Cai, “Linear Network Coding,” IEEEcan see later, such small q already yields high reconstruction Trans. Information Theory, vol. 49, no. 2, pp. 371-381, Feb. 2003.probabilities. Furthermore, we choose ¼ 2, which means we [8] R. Koetter and M. Medard, “An Algebraic Approach to Network ´only need to compute the combination of two randomly Coding,” IEEE/ACM Trans. Networking, vol. 11, no. 5, pp. 782-795, Oct. 2003.selected packets each time. The results of our experiments are [9] S. Jaggi, P. Sanders, P.A. Chou, M. Effros, S. Egner, K. Jain, andsummarized in Table 2. L.M. Tolhuizen, “Polynomial Time Algorithms for Multicast As we can see from Table 2, the reconstruction probabil- Network Code Construction,” IEEE Trans. Information Theory,ities are very close to 1, which shows that the sparse variant vol. 51, no. 6, pp. 1973-1982, June 2005. [10] C. Gkantsidis and P. Rodriguez, “Cooperative Security forof the random linear coding performs just as good as the Network Coding File Distribution,” technical report, Microsoftoriginal random linear coding even for very small . Research, 2004.
  11. 11. LI ET AL.: ON THE SECURITY AND EFFICIENCY OF CONTENT DISTRIBUTION VIA NETWORK CODING 221 ´[11] T. Ho, B. Leong, R. Koetter, M. Medard, M. Effros, and D.R. Qiming Li received the BEng degree in com- Karger, “Byzantine Modification Detection in Multicast Networks puter engineering and the PhD degree from the Using Randomized Network Coding,” Proc. IEEE Int’l Symp. Department of Computer Science, School of Information Theory, 2004. Computing, National University of Singapore[12] C. Gkantsidis, J. Miller, and P. Rodriguez, “Anatomy of a P2P (NUS). From 2006 to 2007, he was a postdoc Content Distribution System with Network Coding,” Proc. Int’l research fellow in Polytechnic University with Workshop Peer-to-Peer Systems, Feb. 2006. professor Nasir Memon. His research interests ´[13] T. Ho, R. Koetter, M. Medard, D.R. Karger, and M. Effros, include cryptography, multimedia security, net- “The Benefits of Coding over Routing in a Randomized work security, and algorithms. Setting,” Proc. IEEE Int’l Symp. Information Theory, 2003. `[14] M.N. Krohn, M.J. Freedman, and D. Mazieres, “On-the-Fly Verification of Rateless Erasure Codes for Efficient Content Distribution,” Proc. IEEE Symp. Security and Privacy, pp. 226-240, John C.S. Lui received the PhD degree in May 2004. computer science from UCLA. He is currently the[15] M. Bellare, O. Goldreich, and S. Goldwasser, “Incremental chairman of the Department of Computer Cryptography: The Case of Hashing and Signing,” Proc. CRYPTO, Science and Engineering, The Chinese Univer- 1994. sity of Hong Kong. His current research interests[16] C. Gkantsidis and P. Rodriguez, “Cooperative Security for include data networks, system and applied Network Coding File Distribution,” Proc. IEEE INFOCOM, security, multimedia systems, online social net- pp. 1-13, Apr. 2006. works, and cloud computing. He received var-[17] S. Contini, A.K. Lenstra, and R. Steinfeld, “VSH, an Efficient and ious departmental teaching awards and the Provable Collision-Resistant Hash Function,” Proc. EUROCRYPT, CUHK Vice-Chancellor’s Exemplary Teaching pp. 165-182, 2006. Award, as well as the corecipient of the Best Student Paper Awards in[18] R. Koetter and M. Medard, “Beyond Routing: An Algebraic the IFIP WG 7.3 Performance 2005 and the IEEE/IFIP Network ´ Approach to Network Coding,” Proc. IEEE INFOCOM, pp. 122- Operations and Management (NOMS) Conference. He is an associate 130, 2002. editor of the Performance Evaluation Journal, IEEE TC, IEEE TPDS, and[19] J. Edmonds, “Minimum Partition of a Matroid into Independent IEEE/ACM Transactions on Networking. He is a fellow of the ACM and Sets,” J. Research of the Nat’l Bureau of Standards, vol. 869, pp. 67-72, the IEEE. 1965.[20] B. Fan, J.C.S. Lui, and D.-M. Chiu, “The Design Trade-Offs of Dah-Ming Chiu received the BSc degree in BitTorrent-Like File Sharing Protocols,” IEEE/ACM Trans. Net- electrical engineering from Imperial College, working, vol. 17, no. 2, pp. 365-376, Apr. 2009. University of London, and the PhD degree from[21] R.T.B. Ma, S.C.M. Lee, J.C.S. Lui, and D.K.Y. Yau, “Incentive and Harvard University in 1975 and 1980, respec- Service Differentiation in P2P Networks: A Game Theoretic tively. From 1979 to 1980, he was a member of Approach,” IEEE/ACM Trans. Networking, vol. 14, no. 5, pp. 978- Technical Staff with Bell Labs. From 1980 to 991, Oct. 2006. 1996, he was a principal engineer, and later a[22] M. Bellare, J. Garay, and T. Rabin, “Fast Batch Verification for consulting engineer at Digital Equipment Cor- Modular Exponentiation and Digital Signatures,” Proc. EURO- poration. From 1996 to 2002, he was with Sun CRYPT, 1998. Microsystems Research Labs. Currently, he is a[23] J. Le, J.C. Lui, and D.-M. Chiu, “On the Performance Bounds of professor and chairman in the Department of Information Engineering, Practical Wireless Network Coding,” IEEE Trans. Mobile Comput- The Chinese University of Hong Kong. He is known for his contribution in ing, vol. 9, no. 8 pp. 1134-1146, Aug. 2010. studying network congestion control as a resource allocation problem,[24] J. Le, J.C.S. Lui, and D.-M. Chiu, “DCAR: Distributed Coding- the fairness index, and analyzing a distributed algorithm (AIMD) that Aware Routing in Wireless Networks,” IEEE Trans. Mobile became the basis for the congestion control algorithm in the Internet. His Computing, vol. 9, no. 4, pp. 596-608, Apr. 2010. current research interests include economic issues in networking, P2P networks, network traffic monitoring and analysis, and resource allocation and congestion control for the Internet with expanding services. Two recent papers he coauthored with students have won best student paper awards from the IFIP Performance Conference and the IEEE NOMS Conference. Recently, he has served on the TPC of IEEE Infocom, IWQoS, and various other conferences. He is a member of the editorial board of the IEEE/ACM Transactions on Networking, and the International Journal of Communication Systems (Wiley). He is a fellow of the IEEE. . For more information on this or any other computing topic, please visit our Digital Library at