A secure erasure code based cloud storage system with secure data forwarding.bak
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 6, JUNE 2012 995 A Secure Erasure Code-Based Cloud Storage System with Secure Data Forwarding Hsiao-Ying Lin, Member, IEEE, and Wen-Guey Tzeng, Member, IEEE Abstract—A cloud storage system, consisting of a collection of storage servers, provides long-term storage services over the Internet. Storing data in a third party’s cloud system causes serious concern over data confidentiality. General encryption schemes protect data confidentiality, but also limit the functionality of the storage system because a few operations are supported over encrypted data. Constructing a secure storage system that supports multiple functions is challenging when the storage system is distributed and has no central authority. We propose a threshold proxy re-encryption scheme and integrate it with a decentralized erasure code such that a secure distributed storage system is formulated. The distributed storage system not only supports secure and robust data storage and retrieval, but also lets a user forward his data in the storage servers to another user without retrieving the data back. The main technical contribution is that the proxy re-encryption scheme supports encoding operations over encrypted messages as well as forwarding operations over encoded and encrypted messages. Our method fully integrates encrypting, encoding, and forwarding. We analyze and suggest suitable parameters for the number of copies of a message dispatched to storage servers and the number of storage servers queried by a key server. These parameters allow more flexible adjustment between the number of storage servers and robustness. Index Terms—Decentralized erasure code, proxy re-encryption, threshold cryptography, secure storage system. Ç1 INTRODUCTIONA S high-speed networks and ubiquitous Internet access to an erasure error of the codeword symbol. As long as the become available in recent years, many services are number of failure servers is under the tolerance threshold ofprovided on the Internet such that users can use them from the erasure code, the message can be recovered from theanywhere at any time. For example, the email service is codeword symbols stored in the available storage servers byprobably the most popular one. Cloud computing is a the decoding process. This provides a tradeoff between the http://ieeexploreprojects.blogspot.comconcept that treats the resources on the Internet as a unified storage size and the tolerance threshold of failure servers. Aentity, a cloud. Users just use services without being decentralized erasure code is an erasure code that indepen-concerned about how computation is done and storage is dently computes each codeword symbol for a message. Thus,managed. In this paper, we focus on designing a cloud the encoding process for a message can be split into n parallelstorage system for robustness, confidentiality, and func- tasks of generating codeword symbols. A decentralizedtionality. A cloud storage system is considered as a large- erasure code is suitable for use in a distributed storagescale distributed storage system that consists of many system. After the message symbols are sent to storageindependent storage servers. servers, each storage server independently computes a code- Data robustness is a major requirement for storage word symbol for the received message symbols and stores it. This finishes the encoding and storing process. The recoverysystems. There have been many proposals of storing data process is the same.over storage servers , , , , . One way to provide Storing data in a third party’s cloud system causes seriousdata robustness is to replicate a message such that each concern on data confidentiality. In order to provide strongstorage server stores a copy of the message. It is very robust confidentiality for messages in storage servers, a user canbecause the message can be retrieved as long as one storage encrypt messages by a cryptographic method before apply-server survives. Another way is to encode a message of k ing an erasure code method to encode and store messages.symbols into a codeword of n symbols by erasure coding. To When he wants to use a message, he needs to retrieve thestore a message, each of its codeword symbols is stored in a codeword symbols from storage servers, decode them, anddifferent storage server. A storage server failure corresponds then decrypt them by using cryptographic keys. There are three problems in the above straightforward integration of encryption and encoding. First, the user has to do most. H.-Y. Lin is with the Intelligent Information and Communications computation and the communication traffic between the user Research Center, Department of Computer Science, National Chiao Tung and storage servers is high. Second, the user has to manage University, No. 1001, University Road, Hsinchu City 30010, Taiwan. E-mail: firstname.lastname@example.org. his cryptographic keys. If the user’s device of storing the keys. W.-G. Tzeng is with the Department of Computer Science, National Chiao is lost or compromised, the security is broken. Finally, Tung University, No. 1001, University Road, Hsinchu City 30010, besides data storing and retrieving, it is hard for storage Taiwan. E-mail: email@example.com. servers to directly support other functions. For example,Manuscript received 21 Mar. 2011; revised 12 Sept. 2011; accepted 18 Sept. storage servers cannot directly forward a user’s messages to2011; published online 30 Sept. 2011. another one. The owner of messages has to retrieve, decode,Recommended for acceptance by J. Weissman.For information on obtaining reprints of this article, please send e-mail to: decrypt and then forward them to another firstname.lastname@example.org, and reference IEEECS Log Number tpds-2011-03-0162. In this paper, we address the problem of forwarding dataDigital Object Identifier no. 10.1109/TPDS.2011.252. to another user by storage servers directly under the 1045-9219/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society
996 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 6, JUNE 2012command of the data owner. We consider the system model storage devices over the network such that a user can accessthat consists of distributed storage servers and key servers. the storage devices via network connection. Afterward,Since storing cryptographic keys in a single device is risky, many improvements on scalability, robustness, efficiency,a user distributes his cryptographic key to key servers that and security were proposed , , .shall perform cryptographic functions on behalf of the user. A decentralized architecture for storage systems offersThese key servers are highly protected by security mechan- good scalability, because a storage server can join or leaveisms. To well fit the distributed structure of systems, we without control of a central authority. To provide robust-require that servers independently perform all operations. ness against server failures, a simple method is to makeWith this consideration, we propose a new threshold proxy replicas of each message and store them in different servers.re-encryption scheme and integrate it with a secure However, this method is expensive as z replicas result in zdecentralized code to form a secure distributed storage times of expansion.system. The encryption scheme supports encoding opera- One way to reduce the expansion rate is to use erasuretions over encrypted messages and forwarding operations codes to encode messages , , , , . A messageover encrypted and encoded messages. The tight integra- is encoded as a codeword, which is a vector of symbols, andtion of encoding, encryption, and forwarding makes the each storage server stores a codeword symbol. A storagestorage system efficiently meet the requirements of datarobustness, data confidentiality, and data forwarding. server failure is modeled as an erasure error of the storedAccomplishing the integration with consideration of a codeword symbol. Random linear codes support distributeddistributed structure is challenging. Our system meets the encoding, that is, each codeword symbol is independentlyrequirements that storage servers independently perform computed. To store a message of k blocks, each storage serverencoding and re-encryption and key servers independently linearly combines the blocks with randomly chosen coeffi-perform partial decryption. Moreover, we consider the cients and stores the codeword symbol and coefficients. Tosystem in a more general setting than previous works. This retrieve the message, a user queries k storage servers forsetting allows more flexible adjustment between the the stored codeword symbols and coefficients and solves thenumber of storage servers and robustness. linear system. Dimakis et al.  considered the case that Our contributions. Assume that there are n distributed n ¼ ak for a fixed constant a. They showed that distributingstorage servers and m key servers in the cloud storage each block of a message to v randomly chosen storage serverssystem. A message is divided into k blocks and represented is enough to have a probability 1 À k=p À oð1Þ of a successfulas a vector of k symbols. Our contributions are as follows: data retrieval, where v ¼ b ln k, b > 5a, and p is the order of the used group. The sparsity parameter v ¼ b ln k is the 1. We construct a secure cloud storage system that number of storage servers which a block is sent to. The larger http://ieeexploreprojects.blogspot.com supports the function of secure data forwarding by v is, the communication cost is higher and the successful using a threshold proxy re-encryption scheme. The retrieval probability is higher. The system has a light data encryption scheme supports decentralized erasure confidentiality because an attacker can compromise k storage codes over encrypted messages and forwarding servers to get the message. operations over encrypted and encoded messages. Lin and Tzeng  addressed robustness and confidenti- Our system is highly distributed where storage ality issues by presenting a secure decentralized erasure servers independently encode and forward mes- code for the networked storage system. In addition to sages and key servers independently perform partial decryption. storage servers, their system consists of key servers, which 2. We present a general setting for the parameters of our hold cryptographic key shares and work in a distributed secure cloud storage system. Our parameter setting of way. In their system, stored messages are encrypted and pﬃﬃﬃ previous one of n ¼ ak k, then encoded. To retrieve a message, key servers query n ¼ akc supersedes the pﬃﬃﬃ where c ! 1:5 and a > 2 . Our result n ¼ akc storage servers for the user. As long as the number of allows the number of storage servers be much greater available key servers is over a threshold t, the message can than the number of blocks of a message. In practical be successfully retrieved with an overwhelming probability. systems, the number of storage servers is much more One of their results pﬃﬃﬃ shows that when there areﬃﬃﬃn storage p than k. The sacrifice is to slightly increase the total servers with n ¼ ak k, the parameter v is b k ln k with copies of an encrypted message symbol sent to b > 5a, and each key server queries 2 storage servers for storage servers. Nevertheless, the storage size in each each retrieval request, the probability of a successful storage server does not increase because each storage retrieval is at least 1 À k=p À oð1Þ. server stores an encoded result (a codeword symbol), 2.2 Proxy Re-Encryption Schemes which is a combination of encrypted message Proxy re-encryption schemes are proposed by Mambo and symbols. Okamoto  and Blaze et al. . In a proxy re-encryption scheme, a proxy server can transfer a ciphertext under a2 RELATED WORKS public key PKA to a new one under another public key PKBWe briefly review distributed storage systems, proxy re- by using the re-encryption key RKA!B . The server does notencryption schemes, and integrity checking mechanisms. know the plaintext during transformation. Ateniese et al.  proposed some proxy re-encryption schemes and2.1 Distributed Storage Systems applied them to the sharing function of secure storageAt the early years, the Network-Attached Storage (NAS)  systems. In their work, messages are first encrypted by theand the Network File System (NFS)  provide extra owner and then stored in a storage server. When a user
LIN AND TZENG: A SECURE ERASURE CODE-BASED CLOUD STORAGE SYSTEM WITH SECURE DATA FORWARDING 997 key server KSi holds a key share SKA;i , 1 i m. The key is shared with a threshold t. In the data storage phase, user A encrypts his message M and dispatches it to storage servers. A message M is decomposed into k blocks m1 ; m2 ; . . . ; mk and has an identifier ID. User A encrypts each block mi into a ciphertext Ci and sends it to v randomly chosen storage servers. Upon receiving ciphertexts from a user, each storage server linearly combines them with randomly chosen coefficients into a codeword symbol and stores it. Note that a storage server may receive less than k message blocks and weFig. 1. A general system model of our work. assume that all storage servers know the value k in advance. In the data forwarding phase, user A forwards his encryptedwants to share his messages, he sends a re-encryption key to message with an identifier ID stored in storage servers to userthe storage server. The storage server re-encrypts the B such that B can decrypt the forwarded message by his secretencrypted messages for the authorized user. Thus, their key. To do so, A uses his secret key SK and B’s public key Asystem has data confidentiality and supports the data PK to compute a re-encryption key RKID and then sends B A!Bforwarding function. Our work further integrates encryp- RKID to all storage servers. Each storage server uses the re- A!Btion, re-encryption, and encoding such that storage robust- encryption key to re-encrypt its codeword symbol for laterness is strengthened. retrieval requests by B. The re-encrypted codeword symbol is Type-based proxy re-encryption schemes proposed by the combination of ciphertexts under B’s public key. In orderTang  provide a better granularity on the granted right of a to distinguish re-encrypted codeword symbols from intactre-encryption key. A user can decide which type of messages ones, we call them original codeword symbols and re-and with whom he wants to share in this kind of proxy re- encrypted codeword symbols, respectively.encryption schemes. Key-private proxy re-encryption In the data retrieval phase, user A requests to retrieve aschemes are proposed by Ateniese et al. . In a key-private message from storage servers. The message is either storedproxy re-encryption scheme, given a re-encryption key, a by him or forwarded to him. User A sends a retrieval requestproxy server cannot determine the identity of the recipient. to key servers. Upon receiving the retrieval request andThis kind of proxy re-encryption schemes provides higherprivacy guarantee against proxy servers. Although most executing a proper authentication process with user A, eachproxy re-encryption schemes use pairing operations, there key server KSi requests u randomly chosen storage servers http://ieeexploreprojects.blogspot.comexist proxy re-encryption schemes without pairing . to get codeword symbols and does partial decryption on the received codeword symbols by using the key share SKA;i .2.3 Integrity Checking Functionality Finally, user A combines the partially decrypted codewordAnother important functionality about cloud storage is the symbols to obtain the original message M.function of integrity checking. After a user stores data into System recovering. When a storage server fails, a new onethe storage system, he no longer possesses the data at hand. is added. The new storage server queries k available storageThe user may want to check whether the data are properly servers, linearly combines the received codeword symbols asstored in storage servers. The concept of provable data a new one and stores it. The system is then recovered.possession ,  and the notion of proof of storage ,,  are proposed. Later, public auditability of stored 3.2 Threat Modeldata is addressed in . Nevertheless all of them consider We consider data confidentiality for both data storage andthe messages in the cleartext form. data forwarding. In this threat model, an attacker wants to break data confidentiality of a target user. To do so, the attacker colludes with all storage servers, nontarget users,3 SCENARIO and up to ðt À 1Þ key servers. The attacker analyzes storedWe present the scenario of the storage system, the threat messages in storage servers, the secret keys of nontargetmodel that we consider for the confidentiality issue, and a users, and the shared keys stored in key servers. Note thatdiscussion for a straightforward solution. the storage servers store all re-encryption keys provided by users. The attacker may try to generate a new re-encryption3.1 System Model key from stored re-encryption keys. We formally model thisAs shown in Fig. 1, our system model consists of users, n attack by the standard chosen plaintext attack1 of the proxystorage servers SS1 ; SS2 ; . . . ; SSn , and m key servers KS1 ;KS2 ; . . . ; KSm . Storage servers provide storage services and 1. Systems against chosen ciphertext attacks are more secure thankey servers provide key management services. They work systems against the chosen plaintext attack. Here, we only consider the chosen plaintext attack because a homomorphic encryption scheme is notindependently. Our distributed storage system consists of secure against chosen ciphertext attacks. Consider a multiplicative homo-four phases: system setup, data storage, data forwarding, and morphic encryption scheme, where DðSK; EðP K; m1 Þ EðP K; m2 ÞÞ ¼ m1 Á m2 for the encryption function E, the decryption function D, a pairdata retrieval. These four phases are described as follows. of public key P K and secret key SK, an operation , and two messages m1 In the system setup phase, the system manager chooses and m2 . Given a challenge ciphertext C, where C ¼ EðP K; m1 Þ, the attacker 0system parameters and publishes them. Each user A is chooses m2 , computes EðP K; m2 Þ, and computes C ¼ C EðP K; m2 Þ. The attacker queries C 0 to the decryption oracle. The response m ¼ m1 Á m2 fromassigned a public-secret key pair ðPKA ; SKA Þ. User A the decryption oracle reveals the plaintext m1 to the attacker sincedistributes his secret key SKA to key servers such that each m1 ¼ m=m2 .
998 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 6, JUNE 2012 key to storage servers such that storage servers perform the re-encryption operation for him. Thus, the communication cost of the owner is independent of the length of forwarded message and the computation cost of re-encryption is taken care of by storage servers. Proxy re-encryption schemes significantly reduce the overhead of the data forwarding function in a secure storage system. 4 CONSTRUCTION OF SECURE CLOUD STORAGE SYSTEMSFig. 2. The security game for the chosen plaintext attack. Before presenting our storage system, we briefly introduce the algebraic setting, the hardness assumption, an erasurere-encryption scheme in a threshold version, as shown in code over exponents, and our approach.Fig. 2. Bilinear map. Let G 1 and G 2 be cyclic multiplicative G G The challenger C provides the system parameters. After groups2 with a prime order p and g 2 G 1 be a generator. A Gthe attacker A chooses a target user T , the challenger gives map e : G 1 Â G 1 ! G 2 is a bilinear map if it is efficiently ~ G G Ghim ðt À 1Þ key shares of the secret key SKT of the target computable and has the properties of bilinearity anduser T to model ðt À 1Þ compromised key servers. Then, the nondegeneracy: for any x; y 2 Z Ã ; eðgx ; gy Þ ¼ eðg; gÞxy and Zp ~ ~attacker can query secret keys of other users and all re- eðg; gÞ is not the identity element in G 2 . Let Genð1 Þ be an ~ Gencryption keys except those from T to other users. This algorithm generating ðg; e; G 1 ; G 2 ; pÞ, where is the length ~ G Gmodels compromised nontarget users and storage servers. of p. Let x 2R X denote that x is randomly chosen from theIn the challenge phase, the attacker chooses two messages set X.M0 and M1 with the identifiers ID0 and ID1 , respectively. Decisional bilinear Diffie-Hellman assumption. ThisThe challenger throws a random coin b and encrypts the assumption is that it is computationally infeasible tomessage Mb with T ’s public key PKT . After getting the distinguish the distributions (g, gx , gy , gz , eðg; gÞxyz ) and (g, ~ciphertext from the challenger, the attacker outputs a bit b0 gx , gy , gz , eðg; gÞr ), where x; y; z; r 2R Z Ã . Formally, for any ~ Zpfor guessing b. In this game, the attacker wins if and only if probabilistic polynomial time algorithm A, the following isb0 ¼ b. The advantage of the attacker is defined as negligible (in ):j1=2 À Pr½b0 ¼ bj. http://ieeexploreprojects.blogspot.com A cloud storage system modeled in the above is secure if j Pr½Aðg; gx ; gy ; gz ; Q b Þ ¼ b : x; y; z; r 2R Z Ã ; Q Zpno probabilistic polynomial time attacker wins the game Q ~ xyz Q ~ r Q 0 ¼ eðg; gÞ ; Q 1 ¼ eðg; gÞ ; b 2R f0; 1g À 1=2j:with a nonnegligible advantage. A secure cloud storagesystem implies that an unauthorized user or server cannot Erasure coding over exponents. We consider that theget the content of stored messages, and a storage server message domain is the cyclic multiplicative group G 2 Gcannot generate re-encryption keys by himself. If a storage described above. An encoder generates a generator matrixserver can generate a re-encryption key from the target user G ¼ ½gi;j for 1 i k; 1 j n as follows: for each row,to another user B, the attacker can win the security game by the encoder randomly selects an entry and randomly sets are-encrypting the ciphertext to B and decrypting the re- value from Z Ã to the entry. The encoder repeats this step v Zpencrypted ciphertext using the secret key SKB . Therefore, times with replacement for each row. An entry of a row canthis model addresses the security of data storage and data be selected multiple times but only set to one value. Theforwarding. values of the rest entries are set to 0. Let the message be ðm1 ; m2 ; . . . ; mk Þ 2 G k . The encoding process is to generate G23.3 A Straightforward Solution g g g ðw1 ; w2 ; . . . ; wn Þ 2 G n , w h e r e wj ¼ m11;j m22;j Á Á Á mkk;j f o r G2A straightforward solution to supporting the data forward- 1 j n. The first step of the decoding process is toing function in a distributed storage system is as follows: compute the inverse of a k Â k submatrix K of G. Let K bewhen the owner A wants to forward a message to user B, he ½g for 1 i; j k. Let K À1 ¼ ½di;j 1 i;j k . The final step of i;ji idownloads the encrypted message and decrypts it by using the decoding process is to compute m ¼ wd1;i wd2;i Á Á Á wdk;i for i j1 j2 jkhis secret key. He then encrypts the message by using B’s 1 i k. An example is shown in Fig. 3. User A stores twopublic key and uploads the new ciphertext. When B wants messages m and m into four storage servers. When the 1 2to retrieve the forwarded message from A, he downloads storage servers SS and SS are available and the k Â k 1 3the ciphertext and decrypts it by his secret key. The whole submatrix K is invertible, user A can decode m1 and m2data forwarding process needs three communication from the codeword symbols w1 ; w3 and the coefficientsrounds for A’s downloading and uploading and B’s ðg1;1 ; 0Þ; ð0; g2;3 Þ, which are stored in the storage servers SS1downloading. The communication cost is linear in the and SS3 .length of the forwarded message. The computation cost is Our approach. We use a threshold proxy re-encryptionthe decryption and encryption for the owner A, and the scheme with multiplicative homomorphic property. Andecryption for user B. encryption scheme is multiplicative homomorphic if it Proxy re-encryption schemes can significantly decreasecommunication and computation cost of the owner. In a 2. It can also be described as additive groups over points on an ellipticproxy re-encryption scheme, the owner sends a re-encryption curve.
LIN AND TZENG: A SECURE ERASURE CODE-BASED CLOUD STORAGE SYSTEM WITH SECURE DATA FORWARDING 999 fA;1 ðzÞ ¼ a1 þ v1 z þ v2 z2 þ Á Á Á þ vtÀ1 ztÀ1 ðmod pÞ; fA;2 ðzÞ ¼ aÀ1 þ v1 z þ v2 z2 þ Á Á Á þ vtÀ1 ztÀ1 ðmod pÞ; 2 where v1 ; v2 ; . . . ; vtÀ1 2R Z Ã . The key share of the Zp secret key SKA to the key server KSi is SKA;i ¼ ðfA;1 ðiÞ; fA;2 ðiÞÞ, where 1 i m. Data storage. When user A wants to store a message of k blocks m1 ; m2 ; . . . ; mk with the identifier ID, he computes the identity token ¼ hfða3 ;IDÞ and performs the encryption algorithm EncðÁÞ on and k blocks to get k original ciphertexts C1 ; C2 ; . . . ; Ck . An original ciphertext is indi-Fig. 3. A storage system with random linear coding over exponents. cated by a leading bit b ¼ 0. User A sends each ciphertext Ci to v randomly chosen storage servers. A storage serversupports a group operation on encrypted plaintexts receives a set of original ciphertexts with the same identitywithout decryption token from A. When a ciphertext Ci is not received, the storage server inserts Ci ¼ ð0; 1; ; 1Þ to the set. The special DðSK; EðP K; m1 Þ EðP K; m2 ÞÞ ¼ m1 Á m2 ; format of ð0; 1; ; 1Þ is a mark for the absence of Ci . Thewhere E is the encryption function, D is the decryption storage server performs EncodeðÁÞ on the set of k ciphertextsfunction, and ðP K; SKÞ is a pair of public key and secret and stores the encoded result (codeword symbol).key. Given two coefficients g1 and g2 , two message symbolsm1 and m2 can be encoded to a codeword symbol mg1 mg2 in . Enc(PKA ; ; m1 ; m2 ; . . . ; mk ). For 1 i k, this algo- 1 2the encrypted form rithm computes C ¼ EðP K; m1 Þg1 EðP K; m2 Þg2 ¼ EðP K; mg1 Á mg2 Þ: Ci ¼ ð0; i ;
; i Þ ¼ ð0; gri ; ; mi eðga1 ; ri ÞÞ; ~ 1 2Thus, a multiplicative homomorphic encryption scheme where ri 2R Z Ã ; 1 i k and 0 is the leading bit Zpsupports the encoding operation over encrypted messages. indicating an original ciphertext.We then convert a proxy re-encryption scheme with multi- . Encode(C1 ; C2 ; . . . ; Ck ). For each ciphertext Ci , theplicative homomorphic property into a threshold version. A algorithm randomly selects a coefficient gi . If somesecret key is shared to key servers with a threshold value t ciphertext Ci is ð0; 1; ; 1Þ, the coefficient gi is set to 0. http://ieeexploreprojects.blogspot.com;
; i Þ. The encoding process is tovia the Shamir secret sharing scheme , where t ! k. In Let Ci ¼ ð0; iour system, to decrypt for a set of k message symbols, each compute an original codeword symbol C 0key server independently queries 2 storage servers and !partially decrypts two encrypted codeword symbols. As YÀ g Á YÀ g Á k k 0 C ¼ 0; i i ;
; i ilong as t key servers are available, k codeword symbols are i¼1 i¼1obtained from the partially decrypted ciphertexts. ! Pk Y g k Pk gi ri a1 gi ri4.1 A Secure Cloud Storage System with Secure ¼ 0; g i¼1 ; ; mi eðg ; Þ i¼1 i ~ i¼1 Forwarding 0 r0As described in Section 3.1, there are four phases of our ¼ ð0; g ; ; W eðg; Þa1 r Þ; ~storage system. Qk P where W ¼ i¼1 mgi and r0 ¼ k gi ri . The en- i i¼1 System setup. The algorithm SetUpð1 Þ generates the coded result is ðC 0 ; g1 ; g2 ; . . . ; gk Þ.system parameters . A user uses KeyGenðÞ to generate Data forwarding. User A wants to forward a message tohis public and secret key pair and ShareKeyGenðÁÞ to share another user B. He needs the first component a1 of hishis secret key to a set of m key servers with a threshold t, secret key. If A does not possess a1 , he queries key serverswhere k t m. The user locally stores the third compo-nent of his secret key. for key shares. When at least t key servers respond, A recovers the first component a1 of the secret key SKA via the . SetUp(1 ). Run Genð1 Þ to obtain ðg; h; e; G 1 ; G 2 ; pÞ, ~ G G KeyRecoverðÁÞ algorithm. Let the identifier of the message where e : G 1 Â G 1 ! G 2 is a bilinear map, g and h ~ G G G be ID. User A computes the re-encryption key RKID via A!B are generators of G 1 , and both G 1 and G 2 have the G G G the ReKeyGenðÁÞ algorithm and securely sends the re- prime order p. Set ¼ ðg; h; e; G 1 ; G 2 ; p; fÞ, where f : ~ G G encryption key to each storage server. By using RKID , a A!B Z Ã Â f0; 1gÃ ! Z Ã is a one-way hash function. Zp Zp storage server re-encrypts the original codeword symbol C 0 . KeyGen(). For a user A, the algorithm selects with the identifier ID into a re-encrypted codeword symbol a1 ; a2 ; a3 2R Z Ã and sets Zp C 00 via the ReEncðÁÞ algorithm such that C 00 is decryptable PKA ¼ ðga1 ; ha2 Þ; SKA ¼ ða1 ; a2 ; a3 Þ: by using B’s secret key. A re-encrypted codeword symbol is indicated by the leading bit b ¼ 1. Let the public key PKB of user B be ðgb1 ; hb2 Þ. . ShareKeyGen(SKA , t, m). This algorithm shares the secret key SKA of a user A to a set of m key servers . KeyRecover(SKA;i1 ; SKA;i2 ; . . . ; SKA;it ). Let T ¼ fi1 ; by using two polynomials fA;1 ðzÞ and fA;2 ðzÞ of i2 ; . . . ; it g. This algorithm recovers a1 via Lagrange degree ðt À 1Þ over the finite field GF(p) interpolation as follows:
1000 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 6, JUNE 2012 0 1 X Y Às0 A The algorithm combines the t values (
i1 ;j1 ; 0 a1 ¼ @fA;1 ðsÞ mod p: 0 0 a1 fA;1 ð0Þ s À s0
it ;jt ) to obtain ¼ via the La- s2T s0 2T=fsg grange interpolation over exponents Y Q Àj . ReKeyGen(PKA ; SKA ; ID; PKB ). This algorithm se- a1 ¼ ð
i;j Þ r2SJ ;r6¼j rÀj ¼ fA;1 ð0Þ: 0 lects e 2R Z Ã and computes Zp ði;jÞ2S RKID ¼ ððhb2 Þa1 ðfða3 ;IDÞþeÞ ; ha1 e Þ: A!B For each of the partially decrypted codeword symbols i;j , where i 2 SI , the algorithm computes 0 an encoded block . ReEnc(RKID ; C 0 ). Let C 0 ¼ ð0; ;
; Þ ¼ ð0; gr ; ; A!B 0 W eðga1 ; r ÞÞ for some r0 and some W , and RKID ¼ ~ A!B i;j wi eðga1 ; r Þ ~ 0 ðhb2 a1 ðfða3 ;IDÞþeÞ ; ha1 e Þ for some e. The re-encrypted wi ¼ ¼ ; ð1Þ eði;j ; ~ fA;1 ð0Þ Þ eðgr0 ; fA;1 ð0Þ Þ ~ codeword symbol is computed as follows: for some r0 , where fA;1 ð0Þ ¼ a1 . C 00 ¼ ð1; ; hb2 a1 ðfða3 ;IDÞþeÞ ; Á eð; ha1 e ÞÞ ~ g g g Observe that wi ¼ m11;i m22;i Á Á Á mkk;i for i 2 SI , and 0 0 ¼ ð1; gr ; hb2 a1 ðfða3 ;IDÞþeÞ ; W eðg; hÞa1 r ðfða3 ;IDÞþeÞ Þ: ~ there are k such equations. Consider the square matrix K ¼ ½gi;j , where 1 i k; j 2 SI . The decod-Note that the leading bit 1 indicates C 00 is a re-encrypted ing process is to compute K À1 and output the blocksciphertext. m1 ; m2 ; . . . ; mk . The algorithm fails when the square Data retrieval. There are two cases for the data retrieval matrix K is noninvertible. We shall analyze thephase. The first case is that a user A retrieves his own probability of K being noninvertible in Section 4.2.message. When user A wants to retrieve the message with the In the second case b ¼ 1 for re-encrypted code-identifier ID, he informs all key servers with the identity word symbols, user B wants to retrieve the messagetoken . A key server first retrieves original codeword forwarded to him. The algorithm does the followingsymbols from u randomly chosen storage servers and then computation to obtainperforms partial decryption ShareDecðÁÞ on every retrieved Y Q Àj original codeword symbol C 0 . The result of partial decryption h ðfða3 ;IDÞþeÞa1 ¼ 0 ð
i;j Þ r2SJ ;r6¼j rÀjis called a partially decrypted codeword symbol. The key ði;jÞ2Sserver sends the partially decrypted codeword symbols and ¼ hðfða3 ;IDÞþeÞa1 b2 fB;2 ð0Þ ; http://ieeexploreprojects.blogspot.comthe coefficients to user A. After user A collects replies from atleast t key servers and at least k of them are originally from where fB;2 ð0Þ ¼ bÀ1 . Again, for each of i;j , where 2distinct storage servers, he executes CombineðÁÞ on the t i 2 SI , the algorithm computes an encoded blockpartially decrypted codeword symbols to recover the blocks 0m1 ; m2 ; . . . ; mk . The second case is that a user B retrieves a i;j wi eðg; hÞa1 r ðfða3 ;IDÞþeÞ ~ wi ¼ ¼ ;message forwarded to him. User B informs all key servers eði;j ; hðfða3 ;IDÞþeÞa1 Þ ~ eðgr0 ; hðfða3 ;IDÞþeÞa1 Þ ~directly. The collection and combining parts are the same as ð2Þthe first case except that key servers retrieve re-encryptedcodeword symbols and perform partial decryption Share- for some e and r0 . The rest in the second case is theDecðÁÞ on re-encrypted codeword symbols. same as that in the first case. . ShareDec(SKj ; Xi ). Xi is a codeword symbol, where 4.2 Analysis Xi ¼ ðb; ;
; ) and b is the indicator for original We analyze storage and computation complexities, correct- and re-encrypted codeword symbols. SKj is a key ness, and security of our cloud storage system in this section. share, where SKj ¼ ðsk0 ; sk1 Þ. By using the key share Let the bit-length of an element in the group G 1 be l1 and G 2 G G SKj , the partially decrypted codeword symbol i;j of be l2 . Let coefficients gi;j be randomly chosen from f0; 1gl3 . Xi is generated as follows: Storage cost. To store a message of k blocks, a storage server SSj stores a codeword symbol ðb; j ; ; j Þ and the i;j ¼ ðb; ;
skb ; Þ: coefficient vector ðg1;j ; g2;j ; . . . ; gk;j Þ. They are total of ð1 þ 2l1 þ l2 þ kl3 Þ bits, where j ; 2 G 1 and j 2 G 2 . The G G . Combine(i1 ;j1 ; i2 ;j2 ; . . . ; it ;jt ). Let a partially de- average cost for a message bit stored in a storage server is crypted codeword symbol i;j be ðb; i;j ;
i;j ; i;j Þ.0 ð1 þ 2l1 þ l2 þ kl3 Þ=kl2 bits, which is dominated by l3 =l2 for a This algorithm combines t partially decrypted code- sufficiently large k. In practice, small coefficients, i.e., word symbols, where
it ;jt ¼ , l3 ( l2 , reduce the storage cost in each storage server. j1 6¼ j2 6¼ . . . 6¼ jt and there are at least k distinct Computation cost. We measure the computation cost by values in fi1 ; i2 ; . . . ; it g. Let SJ ¼ fj1 ; j2 ; . . . ; jt g and the number of pairing operations, modular exponentiations S ¼ fði1 ; j1 Þ; ði2 ; j2 Þ; . . . ; ðit ; jt Þg. Without loss of gen- in G 1 and G 2 , modular multiplications in G 1 and G 2 , and G G G G erality, let SI ¼ fi1 ; i2 ; . . . ; ik g be k distinct values in arithmetic operations over GF ðpÞ. These operations are fi1 ; i2 ; . . . ; it g. denoted as Pairing, Exp1 , Exp2 , Mult1 , Mult2 , and Fp , In the first case b ¼ 0 for original codeword respectively. The cost is summarized in Table 1. Computing symbols, user A wants to retrieve his own message. an Fp takes much less time than computing a Mult1 or a
LIN AND TZENG: A SECURE ERASURE CODE-BASED CLOUD STORAGE SYSTEM WITH SECURE DATA FORWARDING 1001 TABLE 1 operations over GF ðpÞ, and the decoding for each block The Computation Cost of Each Algorithm takes k Exp2 and ðk À 1Þ Mult2 . in Our Secure Cloud Storage System Correctness. There are two cases for correctness. The owner A correctly retrieves his message and user B correctly retrieves a message forwarded to him. The correctness of encryption and decryption for A can be seen in (1). The correctness of re-encryption and decryption for B can be seen in (2). As long as at least k storage servers are available, a user can retrieve data with an overwhelming probability. Thus, our storage system tolerates n À k server failures. The probability of a successful retrieval. A successful retrieval is an event that a user successfully retrieves all k blocks of a message no matter whether the message is owned by him or forwarded to him. The randomness comes from the random selection of storage servers in the data storage phase, the random coefficients chosen by storage servers, and the random selection of key servers in the data retrieval phase. The probability of a successful retrieval depends on (n; k; u; v) and all randomness. The methodology of analysis is similar to that in  and . However, we consider a different system model from the one in  and a more flexible parameter setting for n ¼ akc than the settings in  and . The difference between our system model and the one in  is that our system modelMult2 . The time of computing an Exp1 is 1:5dlog pe times as has key servers. In , a single user queries k distinctmuch as the time of computing a Mult1 , on average, (by storage servers to retrieve the data. On the other hand, eachusing the square-and-multiply algorithm). Similarly, the key server in our system independently queries u storagetime of computing a Exp2 is 1:5dlog pe times as much as the servers. The use of distributed key servers increases the leveltime of computing a Mult2 , on average. of key protection but makes the analysis harder. In the data storage phase, a user runs the EncðÁÞ The ratio n=k is considered as a fixed constant in . http://ieeexploreprojects.blogspot.comalgorithm and each storage server performs the EncodeðÁÞ In , the setting is extended to n ¼ akc . Our general- 3=2algorithm. In the EncðÁÞ algorithm, generating each i ization of parameter setting for n ¼ ak , where c ! 1:5,requires a Exp1 , and generating each i requires a Exp1 , a allows the number of storage servers be much greater thanPairing, and a Mult2 . Hence, for k blocks of a message, the the number of blocks of a message. It gives a bettercost is (k Pairing þ 2k Exp1 þ k Mult2 ). For the EncodeðÁÞ flexibility for adjustment between the number of storagealgorithm, each storage server encodes k ciphertexts at servers and robustness. This generalization is obtained by observing that Pr½E1 is better bounded by choosingmost. The cost is k Exp1 þ ðk À 1Þ Mult1 for computing c ! 1:5. The proof of Theorem 1 is given in Appendix A,and k Exp2 þ ðk À 1Þ Mult2 for computing . which can be found on the Computer Society Digital In the data forwarding phase, a user runs KeyRecoverðÁÞ Library at http://doi.ieeecomputersociety.org/10.1109/and ReKeyGenðÁÞ and each storage server performs TPDS.2011.252.ReEncðÁÞ. In the KeyRecoverðÁÞ algorithm, the computationcost is Oðt Þ Fp . In the ReKeyGenðÁÞ algorithm, the Theorem 1. Assume that there are k blocks of a message, n 2computation cost is a Exp1 . In the ReEncðÁÞ algorithm, the storage servers, and m key servers, where ﬃﬃﬃ ¼ akc , m ! t ! k, pncomputation cost is a Pairing and a Mult1 . c ! 1:5 and a is a constant with a 2. For v ¼ bkcÀ1 ln k In the data retrieval phase, each key server runs the and u ¼ 2 with b 5a, the probability of a successful retrievalShareDecðÁÞ algorithm and the user performs the is at least 1 À k=p À oð1Þ.CombineðÁÞ algorithm. In the ShareDecðÁÞ algorithm, eachkey server performs a Exp1 to get
skb for a codeword Security. The data confidentiality of our cloud storagesymbol. For a successful retrieval, t key servers would be system is guaranteed even if all storage servers, nontargetsufficient; hence, for this step, the total cost of t key servers users, and up to ðt À 1Þ key servers are compromised by theis t Exp1 . In the CombineðÁÞ algorithm, it needs the attacker. Recall the security game illustrated in Fig. 2. Thecomputation of the Lagrange interpolation over exponents proof for Theorem 2 is provided in Appendix B, available inin G 1 , the computation of the encoded blocks wj ’s from the the online supplementary material. G ~partially decrypted codeword symbols i;j ’s, and the Theorem 2. Our cloud storage system described in Section 4.1 isdecoding computation which needs to perform the matrix secure under the threat model in Section 3.2 if the decisionalinversion and recovery of blocks mi ’s from the encoded bilinear Diffie-Hellman assumption holds.blocks wj ’s. The Lagrange interpolation over exponents inG 1 needs Oðt2 Þ Fp , t Exp1 , and ðt À 1Þ Mult1 . Computing an Gencoded block wj needs one Pairing and one modular 5 DISCUSSION AND CONCLUSIONdivision, which takes 2 Mult2 . As for the decoding In this paper, we consider a cloud storage system consists ofcomputation, the matrix inversion takes Oðk3 Þ arithmetic storage servers and key servers. We integrate a newly
1002 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 6, JUNE 2012proposed threshold proxy re-encryption scheme and  R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G.M. Voelker, “Total Recall: System Support for Automated Availabilityerasure codes over exponents. The threshold proxy re- Management,” Proc. First Symp. Networked Systems Design andencryption scheme supports encoding, forwarding, and Implementation (NSDI), pp. 337-350, 2004.partial decryption operations in a distributed way. To  A.G. Dimakis, V. Prabhakaran, and K. Ramchandran, “Ubiqui- tous Access to Distributed Data in Large-Scale Sensor Net-decrypt a message of k blocks that are encrypted and works through Decentralized Erasure Codes,” Proc. Fourth Int’lencoded to n codeword symbols, each key server only has Symp. Information Processing in Sensor Networks (IPSN), pp. 111-to partially decrypt two codeword symbols in our system. 117, 2005.By using the threshold proxy re-encryption scheme, we  A.G. Dimakis, V. Prabhakaran, and K. Ramchandran, “Decen- tralized Erasure Codes for Distributed Networked Storage,” IEEEpresent a secure cloud storage system that provides secure Trans. Information Theory, vol. 52, no. 6 pp. 2809-2816, June 2006.data storage and secure data forwarding functionality in a  M. Mambo and E. Okamoto, “Proxy Cryptosystems: Delegation ofdecentralized structure. Moreover, each storage server the Power to Decrypt Ciphertexts,” IEICE Trans. Fundamentals of Electronics, Comm. and Computer Sciences, vol. E80-A, no. 1, pp. 54-independently performs encoding and re-encryption and 63, 1997.each key server independently performs partial decryption.  M. Blaze, G. Bleumer, and M. Strauss, “Divertible Protocols and Our storage system and some newly proposed content Atomic Proxy Cryptography,” Proc. Int’l Conf. Theory and Applica-addressable file systems and storage system , ,  tion of Cryptographic Techniques (EUROCRYPT), pp. 127-144, 1998.  G. Ateniese, K. Fu, M. Green, and S. Hohenberger, “Improvedare highly compatible. Our storage servers act as storage Proxy Re-Encryption Schemes with Applications to Securenodes in a content addressable storage system for storing Distributed Storage,” ACM Trans. Information and System Security,content addressable blocks. Our key servers act as access vol. 9, no. 1, pp. 1-30, 2006.  Q. Tang, “Type-Based Proxy Re-Encryption and Its Construction,”nodes for providing a front-end layer such as a traditional Proc. Ninth Int’l Conf. Cryptology in India: Progress in Cryptologyfile system interface. Further study on detailed cooperation (INDOCRYPT), pp. 130-144, 2008.is required.  G. Ateniese, K. Benson, and S. Hohenberger, “Key-Private Proxy Re-Encryption,” Proc. Topics in Cryptology (CT-RSA), pp. 279-294, 2009.ACKNOWLEDGMENTS  J. Shao and Z. Cao, “CCA-Secure Proxy Re-Encryption without Pairings,” Proc. 12th Int’l Conf. Practice and Theory in Public KeyThe authors thank anonymous reviewers for their valu- Cryptography (PKC), pp. 357-376, 2009.able comments. The research was supported in part by  G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, “Provable Data Possession at Untrustedprojects ICTL-100-Q707, ATU-100-W958, NSC 98-2221-E- Stores,” Proc. 14th ACM Conf. Computer and Comm. Security (CCS),009-068-MY3, NSC 99-2218-E-009-017-, and NSC 99-2218- pp. 598-609, 2007.E-009-020.  G. Ateniese, R.D. Pietro, L.V. Mancini, and G. Tsudik, “Scalable and Efficient Provable Data Possession,” Proc. Fourth Int’l Conf. http://ieeexploreprojects.blogspot.comin Comm. Netowrks (SecureComm), pp. 1-10, Security and Privacy 2008.REFERENCES  H. Shacham and B. Waters, “Compact Proofs of Retrievability,” J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Proc. 14th Int’l Conf. Theory and Application of Cryptology and Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and Information Security (ASIACRYPT), pp. 90-107, 2008. B. Zhao, “Oceanstore: An Architecture for Global-Scale Persis-  G. Ateniese, S. Kamara, and J. Katz, “Proofs of Storage from tent Storage,” Proc. Ninth Int’l Conf. Architectural Support for Homomorphic Identification Protocols,” Proc. 15th Int’l Conf. Programming Languages and Operating Systems (ASPLOS), pp. 190- Theory and Application of Cryptology and Information Security 201, 2000. (ASIACRYPT), pp. 319-333, 2009. P. Druschel and A. Rowstron, “PAST: A Large-Scale, Persistent  K.D. Bowers, A. Juels, and A. Oprea, “HAIL: A High-Availability Peer-to-Peer Storage Utility,” Proc. Eighth Workshop Hot Topics in and Integrity Layer for Cloud Storage,” Proc. 16th ACM Conf. Operating System (HotOS VIII), pp. 75-80, 2001. Computer and Comm. Security (CCS), pp. 187-198, 2009. A. Adya, W.J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J.R.  C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-Preserving Douceur, J. Howell, J.R. Lorch, M. Theimer, and R. Wattenhofer, Public Auditing for Data Storage Security in Cloud Computing,” “Farsite: Federated, Available, and Reliable Storage for an Proc. IEEE 29th Int’l Conf. Computer Comm. (INFOCOM), pp. 525- Incompletely Trusted Environment,” Proc. Fifth Symp. Operating 533, 2010. System Design and Implementation (OSDI), pp. 1-14, 2002.  A. Shamir, “How to Share a Secret,” ACM Comm., vol. 22, pp. 612- A. Haeberlen, A. Mislove, and P. Druschel, “Glacier: Highly 613, 1979. Durable, Decentralized Storage Despite Massive Correlated Fail-  C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. ures,” Proc. Second Symp. Networked Systems Design and Implemen- Strzelczak, J. Szczepkowski, C. Ungureanu, and M. Welnicki, tation (NSDI), pp. 143-158, 2005. “Hydrastor: A Scalable Secondary Storage,” Proc. Seventh Conf. File Z. Wilcox-O’Hearn and B. Warner, “Tahoe: The Least-Authority and Storage Technologies (FAST), pp. 197-210, 2009. Filesystem,” Proc. Fourth ACM Int’l Workshop Storage Security and  C. Ungureanu, B. Atkin, A. Aranya, S. Gokhale, S. Rago, G. Survivability (StorageSS), pp. 21-26, 2008. Calkowski, C. Dubnicki, and A. Bohra, “Hydrafs: A High- H.-Y. Lin and W.-G. Tzeng, “A Secure Decentralized Erasure Code Throughput File System for the Hydrastor Content-Addressable for Distributed Network Storage,” IEEE Trans. Parallel and Storage System,” Proc. Eighth USENIX Conf. File and Storage Distributed Systems, vol. 21, no. 11, pp. 1586-1594, Nov. 2010. Technologies (FAST), p. 17, 2010. D.R. Brownbridge, L.F. Marshall, and B. Randell, “The Newcastle  W. Dong, F. Douglis, K. Li, H. Patterson, S. Reddy, and P. Shilane, Connection or Unixes of the World Unite!,” Software Practice and “Tradeoffs in Scalable Data Routing for Deduplication Clusters,” Experience, vol. 12, no. 12, pp. 1147-1162, 1982. Proc. Ninth USENIX Conf. File and Storage Technologies (FAST), p. 2, R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon, 2011. “Design and Implementation of the Sun Network Filesystem,” Proc. USENIX Assoc. Conf., 1985. M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu, “Plutus: Scalable Secure File Sharing on Untrusted Storage,” Proc. Second USENIX Conf. File and Storage Technologies (FAST), pp. 29- 42, 2003. S.C. Rhea, P.R. Eaton, D. Geels, H. Weatherspoon, B.Y. Zhao, and J. Kubiatowicz, “Pond: The Oceanstore Prototype,” Proc. Second USENIX Conf. File and Storage Technologies (FAST), pp. 1-14, 2003.
LIN AND TZENG: A SECURE ERASURE CODE-BASED CLOUD STORAGE SYSTEM WITH SECURE DATA FORWARDING 1003 Hsiao-Ying Lin received the MS and PhD Wen-Guey Tzeng received the BS degree in degrees in computer science from National computer science and information engineering Chiao Tung University, Taiwan, in 2005 and from National Taiwan University, in 1985, and MS 2010, respectively. Currently, she is working as and PhD degrees in computer science from the an assistant research fellow in Intelligent In- State University of New York at Stony Brook, in formation and Communications Research Cen- 1987 and 1991, respectively. He joined the ter. Her current research interests include Department of Computer and Information applied cryptography and information security. Science (now, Department of Computer She is a member of the IEEE. Science), National Chiao Tung University, Tai- wan, in 1991. He now serves as a chairman of the department. His current research interests include cryptology, informa- tion security and network security. He is a member of the IEEE. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib. http://ieeexploreprojects.blogspot.com