2.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2 TABLE 1 Comparison of POR/PDP schemes for a ﬁle consisting of 𝑛 blocks. CSP Client Multiple Prob. of Scheme Type Comm. Frag. Privacy Comp. Comp. Clouds Detection PDP[2] 𝐻𝑜𝑚𝑇 𝑂(𝑡) 𝑂(𝑡) 𝑂(1) ✓ ♯ 1 − (1 − 𝜌) 𝑡 SPDP[4] 𝑀 𝐻𝑇 𝑂(𝑡) 𝑂(𝑡) 𝑂(𝑡) ✓ ✓ 1 − (1 − 𝜌) 𝑡⋅𝑠 DPDP-I[5] 𝑀 𝐻𝑇 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) ✓ 1 − (1 − 𝜌) 𝑡 DPDP-II[5] 𝑀 𝐻𝑇 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) 1 − (1 − 𝜌)Ω(𝑛) CPOR-I[6] 𝐻𝑜𝑚𝑇 𝑂(𝑡) 𝑂(𝑡) 𝑂(1) ♯ 1 − (1 − 𝜌) 𝑡 𝑡⋅𝑠 CPOR-II[6] 𝐻𝑜𝑚𝑇 𝑂(𝑡 + 𝑠) 𝑂(𝑡 + 𝑠) 𝑂(𝑠) ✓ ♯ ∏ − (1 − 𝜌) 1 Our Scheme 𝐻𝑜𝑚𝑅 𝑂(𝑡 + 𝑐 ⋅ 𝑠) 𝑂(𝑡 + 𝑠) 𝑂(𝑠) ✓ ✓ ✓ 1 − 𝑃 𝑘 ∈𝒫 (1 − 𝜌 𝑘 ) 𝑟 𝑘 ⋅𝑡⋅𝑠 𝑠 is the number of sectors in each block, 𝑐 is the number of CSPs in a multi-cloud, 𝑡 is the number of sampling blocks, 𝜌 and 𝜌 𝑘 are the probability of block corruption in a cloud server and 𝑘-th cloud server in a multi-cloud 𝒫 = {𝑃 𝑘 },respective, ♯ denotes the veriﬁcation process in a trivial approach, and 𝑀 𝐻𝑇, 𝐻𝑜𝑚𝑇, 𝐻𝑜𝑚𝑅 denotes Merkle Hash tree,homomorphic tags, and homomorphic responses, respectively.and availability of stored data for detecting faults cooperative PDP scheme should provide features forand automatic recovery. Moreover, this veriﬁcation timely detecting abnormality and renewing multipleis necessary to provide reliability by automatically copies of data.maintaining multiple copies of data and automatically Even though existing PDP schemes have addressedredeploying processing logic in the event of failures. various security properties, such as public veriﬁa- Although existing schemes can make a false or true bility [2], dynamics [5], scalability [4], and privacydecision for data possession without downloading preservation [7], we still need a careful considerationdata at untrusted stores, they are not suitable for of some potential attacks, including two major cat-a distributed cloud storage environment since they egories: Data Leakage Attack by which an adversarywere not originally constructed on interactive proof can easily obtain the stored data through veriﬁca-system. For example, the schemes based on Merkle tion process after running or wiretapping sufﬁcientHash tree (MHT), such as DPDP-I, DPDP-II [2] and veriﬁcation communications (see Attacks 1 and 3 inSPDP [4] in Table 1, use an authenticated skip list to http://ieeexploreprojects.blogspot.com Forgery Attack by which acheck the integrity of ﬁle blocks adjacently in space. Appendix A), and Tag dishonest CSP can deceive the clients (see Attacks 2Unfortunately, they did not provide any algorithms and 4 in Appendix A). These two attacks may causefor constructing distributed Merkle trees that are potential risks for privacy leakage and ownershipnecessary for efﬁcient veriﬁcation in a multi-cloud cheating. Also, these attacks can more easily compro-environment. In addition, when a client asks for a ﬁle mise the security of a distributed cloud system thanblock, the server needs to send the ﬁle block along that of a single cloud system.with a proof for the intactness of the block. However,this process incurs signiﬁcant communication over- Although various security models have been pro-head in a multi-cloud environment, since the server posed for existing PDP schemes [2], [7], [6], thesein one cloud typically needs to generate such a proof models still cannot cover all security requirements,with the help of other cloud storage services, where especially for provable secure privacy preservationthe adjacent blocks are stored. The other schemes, and ownership authentication. To establish a highlysuch as PDP [2], CPOR-I, and CPOR-II [6] in Table effective security model, it is necessary to analyze the1, are constructed on homomorphic veriﬁcation tags, PDP scheme within the framework of zero-knowledgeby which the server can generate tags for multiple ﬁle proof system (ZKPS) due to the reason that PDPblocks in terms of a single response value. However, system is essentially an interactive proof system (IPS),that doesn’t mean the responses from multiple clouds which has been well studied in the cryptography com-can be also combined into a single value on the munity. In summary, a veriﬁcation scheme for dataclient side. For lack of homomorphic responses, clients integrity in distributed storage environments shouldmust invoke the PDP protocol repeatedly to check have the following features:the integrity of ﬁle blocks stored in multiple cloud ∙ Usability aspect: A client should utilize theservers. Also, clients need to know the exact position integrity check in the way of collaboration services.of each ﬁle block in a multi-cloud environment. In The scheme should conceal the details of the storageaddition, the veriﬁcation process in such a case will to reduce the burden on clients;lead to high communication overheads and compu- ∙ Security aspect: The scheme should provide ad-tation costs at client sides as well. Therefore, it is of equate security features to resist some existing attacks,utmost necessary to design a cooperative PDP model such as data leakage attack and tag forgery attack;to reduce the storage and network overheads and ∙ Performance aspect: The scheme should haveenhance the transparency of veriﬁcation activities in the lower communication and computation overheadscluster-based cloud storage systems. Moreover, such a than non-cooperative solution.
3.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 3Related Works. To check the availability and integrity correcting code (IP-ECC), which improves the securityof outsourced data in cloud storages, researchers have and efﬁciency of existing tools, like POR. However,proposed two basic approaches called Provable Data a ﬁle must be transformed into 𝑙 distinct segmentsPossession (PDP) [2] and Proofs of Retrievability with the same length, which are distributed across 𝑙(POR) [3]. Ateniese et al. [2] ﬁrst proposed the PDP servers. Hence, this system is more suitable for RAIDmodel for ensuring possession of ﬁles on untrusted rather than a cloud storage.storages and provided an RSA-based scheme for a Our Contributions. In this paper, we address thestatic case that achieves the 𝑂(1) communication problem of provable data possession in distributedcost. They also proposed a publicly veriﬁable version, cloud environments from the following aspects: highwhich allows anyone, not just the owner, to challenge security, transparent veriﬁcation, and high performance.the server for data possession. This property greatly To achieve these goals, we ﬁrst propose a veriﬁcationextended application areas of PDP protocol due to the framework for multi-cloud storage along with twoseparation of data owners and the users. However, fundamental techniques: hash index hierarchy (HIH)these schemes are insecure against replay attacks in and homomorphic veriﬁable response (HVR).dynamic scenarios because of the dependencies on We then demonstrate that the possibility of con-the index of blocks. Moreover, they do not ﬁt for structing a cooperative PDP (CPDP) scheme withoutmulti-cloud storage due to the loss of homomorphism compromising data privacy based on modern crypto-property in the veriﬁcation process. graphic techniques, such as interactive proof system In order to support dynamic data operations, Ate- (IPS). We further introduce an effective constructionniese et al. developed a dynamic PDP solution called of CPDP scheme using above-mentioned structure.Scalable PDP [4]. They proposed a lightweight PDP Moreover, we give a security analysis of our CPDPscheme based on cryptographic hash function and scheme from the IPS model. We prove that thissymmetric key encryption, but the servers can deceive construction is a multi-prover zero-knowledge proofthe owners by using previous metadata or responses system (MP-ZKPS) [11], which has completeness,due to the lack of randomness in the challenges. The knowledge soundness, and zero-knowledge proper-numbers of updates and challenges are limited and ties. These properties ensure that CPDP scheme canﬁxed in advance and users cannot perform block implement the security against data leakage attack andinsertions anywhere. Based on this work, Erway et tag forgery attack. http://ieeexploreprojects.blogspot.comal. [5] introduced two Dynamic PDP schemes with a To improve the system performance with respect tohash function tree to realize 𝑂(log 𝑛) communication our scheme, we analyze the performance of proba-and computational costs for a 𝑛-block ﬁle. The basic bilistic queries for detecting abnormal situations. Thisscheme, called DPDP-I, retains the drawback of Scal- probabilistic method also has an inherent beneﬁt inable PDP, and in the ‘blockless’ scheme, called DPDP- reducing computation and communication overheads.II, the data blocks {𝑚 𝑖 𝑗 } 𝑗∈[1,𝑡] can be leaked by the re- Then, we present an efﬁcient method for the selection ∑sponse of a challenge, 𝑀 = 𝑡𝑗=1 𝑎 𝑗 𝑚 𝑖 𝑗 , where 𝑎 𝑗 is a of optimal parameter values to minimize the compu-random challenge value. Furthermore, these schemes tation overheads of CSPs and the clients’ operations.are also not effective for a multi-cloud environment In addition, we analyze that our scheme is suitable forbecause the veriﬁcation path of the challenge block existing distributed cloud storage systems. Finally, ourcannot be stored completely in a cloud [8]. experiments show that our solution introduces very Juels and Kaliski [3] presented a POR scheme, limited computation and communication overheads.which relies largely on preprocessing steps that the Organization. The rest of this paper is organized asclient conducts before sending a ﬁle to a CSP. Un- follows. In Section 2, we describe a formal deﬁnitionfortunately, these operations prevent any efﬁcient ex- of CPDP and the underlying techniques, which aretension for updating data. Shacham and Waters [6] utilized in the construction of our scheme. We intro-proposed an improved version of this protocol called duce the details of cooperative PDP scheme for multi-Compact POR, which uses homomorphic property cloud storage in Section 3. We describes the securityto aggregate a proof into 𝑂(1) authenticator value and performance evaluation of our scheme in Sectionand 𝑂(𝑡) computation cost for 𝑡 challenge blocks, but 4 and 5, respectively. We discuss the related work intheir solution is also static and could not prevent Section and Section 6 concludes this paper.the leakage of data blocks in the veriﬁcation process.Wang et al. [7] presented a dynamic scheme with 𝑂(log 𝑛) cost by integrating the Compact POR scheme 2 S TRUCTURE AND T ECHNIQUESand Merkle Hash Tree (MHT) into the DPDP. Further- In this section, we present our veriﬁcation frameworkmore, several POR schemes and models have been for multi-cloud storage and a formal deﬁnition ofrecently proposed including [9], [10]. In [9] Bowers CPDP. We introduce two fundamental techniques foret al. introduced a distributed cryptographic system constructing our CPDP scheme: hash index hierarchythat allows a set of servers to solve the PDP problem. (HIH) on which the responses of the clients’ chal-This system is based on an integrity-protected error- lenges computed from multiple CSPs can be com-
4.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 4bined into a single response as the ﬁnal result; and ho- We neither assume that CSP is trust to guaranteemomorphic veriﬁable response (HVR) which supports the security of the stored data, nor assume that datadistributed cloud storage in a multi-cloud storage owner has the ability to collect the evidence of theand implements an efﬁcient construction of collision- CSP’s fault after errors have been found. To achieveresistant hash function, which can be viewed as a this goal, a TTP server is constructed as a core trustrandom oracle model in the veriﬁcation protocol. base on the cloud for the sake of security. We as- sume the TTP is reliable and independent through2.1 Veriﬁcation Framework for Multi-Cloud the following functions [12]: to setup and maintainAlthough existing PDP schemes offer a publicly acces- the CPDP cryptosystem; to generate and store datasible remote interface for checking and managing the owner’s public key; and to store the public parameterstremendous amount of data, the majority of existing used to execute the veriﬁcation protocol in the CPDPPDP schemes are incapable to satisfy the inherent scheme. Note that the TTP is not directly involved inrequirements from multiple clouds in terms of com- the CPDP scheme in order to reduce the complexitymunication and computation costs. To address this of cryptosystemproblem, we consider a multi-cloud storage service asillustrated in Figure 1. In this architecture, a data stor- 2.2 Deﬁnition of Cooperative PDPage service involves three different entities: Clients In order to prove the integrity of data stored inwho have a large amount of data to be stored in a multi-cloud environment, we deﬁne a frameworkmultiple clouds and have the permissions to access for CPDP based on interactive proof system (IPS)and manipulate stored data; Cloud Service Providers and multi-prover zero-knowledge proof system (MP-(CSPs) who work together to provide data storage ZKPS), as follows:services and have enough storages and computa-tion resources; and Trusted Third Party (TTP) who Deﬁnition 1 (Cooperative-PDP): A cooperative prov-is trusted to store veriﬁcation parameters and offer able data possession 𝒮 = (𝐾𝑒𝑦𝐺𝑒𝑛, 𝑇 𝑎𝑔𝐺𝑒𝑛, 𝑃 𝑟𝑜𝑜𝑓 )public query services for these parameters. is a collection of two algorithms (𝐾𝑒𝑦𝐺𝑒𝑛, 𝑇 𝑎𝑔𝐺𝑒𝑛) and an interactive proof system 𝑃 𝑟𝑜𝑜𝑓 , as follows: 𝐾𝑒𝑦𝐺𝑒𝑛(1 𝜅): takes a security parameter 𝜅 as input, http://ieeexploreprojects.blogspot.com key 𝑠𝑘 or a public-secret key- and returns a secret pair (𝑝𝑘, 𝑠𝑘); 𝑇 𝑎𝑔𝐺𝑒𝑛(𝑠𝑘, 𝐹, 𝒫): takes as inputs a secret key 𝑠𝑘, a ﬁle 𝐹 , and a set of cloud storage providers 𝒫 = {𝑃 𝑘 }, and returns the triples (𝜁, 𝜓, 𝜎), where 𝜁 is the secret in tags, 𝜓 = (𝑢, ℋ) is a set of veriﬁcation parameters 𝑢 and an index hierarchy ℋ for 𝐹 , 𝜎 = {𝜎 (𝑘) } 𝑃 𝑘 ∈𝒫 denotes a set of all tags, 𝜎 (𝑘) is the tag of the fraction 𝐹 (𝑘) of 𝐹 in 𝑃 𝑘 ; 𝑃 𝑟𝑜𝑜𝑓 (𝒫, 𝑉 ): is a protocol of proof of data possession between CSPs (𝒫 = {𝑃 𝑘 }) and a veriﬁer (V), that is, 〈 〉 ∑ (𝑘) (𝑘) 𝑃 𝑘 (𝐹 , 𝜎 ) ←→ 𝑉 (𝑝𝑘, 𝜓) 𝑃 𝑘 ∈𝒫 {Fig. 1. Veriﬁcation architecture for data integrity. 1 𝐹 = {𝐹 (𝑘) } is intact = , 0 𝐹 = {𝐹 (𝑘) } is changed In this architecture, we consider the existence of where each 𝑃 𝑘 takes as input a ﬁle 𝐹 (𝑘) and a setmultiple CSPs to cooperatively store and maintain the of tags 𝜎 (𝑘) , and a public key 𝑝𝑘 and a set of publicclients’ data. Moreover, a cooperative PDP is used to parameters 𝜓 are the common input between 𝑃verify the integrity and availability of their stored data and 𝑉 . At the end of the protocol run, 𝑉 ∑ returnsin all CSPs. The veriﬁcation procedure is described as a bit {0∣1} denoting false and true. Where, 𝑃 𝑘 ∈𝒫follows: Firstly, a client (data owner) uses the secret denotes cooperative computing in 𝑃 𝑘 ∈ 𝒫.key to pre-process a ﬁle which consists of a collectionof 𝑛 blocks, generates a set of public veriﬁcation A trivial way to realize the CPDP is to check theinformation that is stored in TTP, transmits the ﬁle data stored in each cloud one by one, i.e.,and some veriﬁcation tags to CSPs, and may delete ⋀its local copy; Then, by using a veriﬁcation protocol, ⟨𝑃 𝑘 (𝐹 (𝑘) , 𝜎 (𝑘) ) ←→ 𝑉 ⟩(𝑝𝑘, 𝜓), 𝑃 𝑘 ∈𝒫the clients can issue a challenge for one CSP to check ⋀the integrity and availability of outsourced data with where denotes the logical AND operations amongrespect to public information stored in TTP. the boolean outputs of all protocols ⟨𝑃 𝑘 , 𝑉 ⟩ for all
5.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 5 𝑃 𝑘 ∈ 𝒫. However, it would cause signiﬁcant commu- We make use of this simple hierarchy to organizenication and computation overheads for the veriﬁer, data blocks from multiple CSP services into a large-as well as a loss of location-transparent. Such a prim- size ﬁle by shading their differences among theseitive approach obviously diminishes the advantages cloud storage systems. For example, in Figure 2 theof cloud storage: scaling arbitrarily up and down on- resources in Express Layer are split and stored intodemand [13]. To solve this problem, we extend above three CSPs, that are indicated by different colors, indeﬁnition by adding an organizer(𝑂), which is one Service Layer. In turn, each CSP fragments and storesof CSPs that directly contacts with the veriﬁer, as the assigned data into the storage servers in Storagefollows: Layer. We also make use of colors to distinguish 〈 〉 different CSPs. Moreover, we follow the logical order ∑ (𝑘) (𝑘) of the data blocks to organize the Storage Layer. 𝑃 𝑘 (𝐹 , 𝜎 ) ←→ 𝑂 ←→ 𝑉 (𝑝𝑘, 𝜓), 𝑃 𝑘 ∈𝒫 This architecture also provides special functions for data storage and management, e.g., there may existwhere the action of organizer is to initiate and orga- overlaps among data blocks (as shown in dashednize the veriﬁcation process. This deﬁnition is con- boxes) and discontinuous blocks but these functionssistent with aforementioned architecture, e.g., a client may increase the complexity of storage management.(or an authorized application) is considered as 𝑉 , theCSPs are as 𝒫 = {𝑃 𝑖 } 𝑖∈[1,𝑐] , and the Zoho cloud is Storage Layer Service Layer Express Layeras the organizer in Figure 1. Often, the organizer isan independent server or a certain CSP in 𝒫. Theadvantage of this new multi-prover proof system is [1(2) H[ ("Cn") (1)that it does not make any difference for the clientsbetween multi-prover veriﬁcation process and single-prover veriﬁcation process in the way of collaboration. [i(3) ,1 H [ ( 2) ( Fi ) 1Also, this kind of transparent veriﬁcation is able toconceal the details of data storage to reduce the [ (1) Hs ¦i 1Wi ("Fn")burden on clients. For the sake of clarity, we list some [ (2) 2 H [ (1) (" Cn ")used signals in Table 2. http://ieeexploreprojects.blogspot.com TABLE 2 CSP1 The signal and its explanation. [3(2) H [ (1) (" Cn ") CSP2 Sig. Repression CSP3 𝑛 the number of blocks in a ﬁle; Overlap 𝑠 the number of sectors in each block; 𝑡 the number of index coefﬁcient pairs in a query; 𝑐 the number of clouds to store a ﬁle; Fig. 2. Index-hash hierarchy of CPDP model. 𝑖∈[1,𝑛] 𝐹 the ﬁle with 𝑛 × 𝑠 sectors, i.e., 𝐹 = {𝑚 𝑖,𝑗 } 𝑗∈[1,𝑠] ; 𝜎 the set of tags, i.e., 𝜎 = {𝜎 𝑖 } 𝑖∈[1,𝑛] ; 𝑄 the set of index-coefﬁcient pairs, i.e., 𝑄 = {(𝑖, 𝑣 𝑖 )}; In storage layer, we deﬁne a common fragment 𝜃 the response for the challenge 𝑄. structure that provides probabilistic veriﬁcation of data integrity for outsourced storage. The fragment structure is a data structure that maintains a set of2.3 Hash Index Hierarchy for CPDP block-tag pairs, allowing searches, checks and updates in 𝑂(1) time. An instance of this structure is shown inTo support distributed cloud storage, we illustrate storage layer of Figure 2: an outsourced ﬁle 𝐹 is splita representative architecture used in our cooperative into 𝑛 blocks {𝑚1 , 𝑚2 , ⋅ ⋅ ⋅ , 𝑚 𝑛 }, and each block 𝑚 𝑖 isPDP scheme as shown in Figure 2. Our architecture split into 𝑠 sectors {𝑚 𝑖,1 , 𝑚 𝑖,2 , ⋅ ⋅ ⋅ , 𝑚 𝑖,𝑠 }. The fragmenthas a hierarchy structure which resembles a natural structure consists of 𝑛 block-tag pair (𝑚 𝑖 , 𝜎 𝑖 ), whererepresentation of ﬁle storage. This hierarchical struc- 𝜎 𝑖 is a signature tag of block 𝑚 𝑖 generated by ature ℋ consists of three layers to represent relation- set of secrets 𝜏 = (𝜏1 , 𝜏2 , ⋅ ⋅ ⋅ , 𝜏 𝑠 ). In order to checkships among all blocks for stored resources. They are the data integrity, the fragment structure implementsdescribed as follows: probabilistic veriﬁcation as follows: given a random 1) Express Layer: offers an abstract representation chosen challenge (or query) 𝑄 = {(𝑖, 𝑣 𝑖 )} 𝑖∈ 𝑅 𝐼 , where of the stored resources; 𝐼 is a subset of the block indices and 𝑣 𝑖 is a ran- 2) Service Layer: offers and manages cloud storage dom coefﬁcient. There exists an efﬁcient algorithm to services; and produce a constant-size response (𝜇1 , 𝜇2 , ⋅ ⋅ ⋅ , 𝜇 𝑠 , 𝜎 ′ ), 3) Storage Layer: realizes data storage on many where 𝜇 𝑖 comes from all {𝑚 𝑘,𝑖 , 𝑣 𝑘 } 𝑘∈𝐼 and 𝜎 ′ is from physical devices. all {𝜎 𝑘 , 𝑣 𝑘 } 𝑘∈𝐼 .
6.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 6 Given a collision-resistant hash function 𝐻 𝑘 (⋅), we a challenge-response protocol, we extend this notationmake use of this architecture to construct a Hash to the concept of Homomorphic Veriﬁable ResponsesIndex Hierarchy ℋ (viewed as a random oracle), (HVR), which is used to integrate multiple responseswhich is used to replace the common hash function from the different CSPs in CPDP scheme as follows:in prior PDP schemes, as follows: 𝑠 Deﬁnition 2 (Homomorphic Veriﬁable Response): A re- 1) Express layer: given 𝑠 random {𝜏 𝑖 } 𝑖=1 and the sponse is called homomorphic veriﬁable response in a ﬁle name 𝐹 𝑛 , sets 𝜉 (1) = 𝐻∑ 𝑖=1 𝜏 𝑖 (𝐹 𝑛 ) and makes PDP protocol, if given two responses 𝜃 and 𝜃 for two 𝑠 𝑠 𝑖 𝑗 it public for veriﬁcation but makes {𝜏 𝑖 } 𝑖=1 secret; challenges 𝑄 and 𝑄 from two CSPs, there exists an 𝑖 𝑗 2) Service layer: given the 𝜉 (1) and the cloud name efﬁcient algorithm to combine them into a response 𝜃 (2) 𝐶 𝑘 , sets 𝜉 𝑘 = 𝐻 𝜉(1) (𝐶 𝑘 ); ∪ corresponding to the sum of the challenges 𝑄 𝑖 𝑄 𝑗. 3) Storage layer: given the 𝜉 (2) , a block number 𝑖, (3) Homomorphic veriﬁable response is the key tech- and its index record 𝜒 𝑖 = “𝐵 𝑖 ∣∣𝑉 𝑖 ∣∣𝑅 𝑖 ”, sets 𝜉 𝑖,𝑘 = nique of CPDP because it not only reduces the com- 𝐻 𝜉(2) (𝜒 𝑖 ), where 𝐵 𝑖 is the sequence number of a 𝑘 munication bandwidth, but also conceals the location block, 𝑉 𝑖 is the updated version number, and 𝑅 𝑖 of outsourced data in the distributed cloud storage is a random integer to avoid collision. environment. As a virtualization approach, we introduce a simpleindex-hash table 𝜒 = {𝜒 𝑖 } to record the changes ofﬁle blocks as well as to generate the hash value of 3 C OOPERATIVE PDP S CHEMEeach block in the veriﬁcation process. The structure In this section, we propose a CPDP scheme for multi-of 𝜒 is similar to the structure of ﬁle block allocation cloud system based on the above-mentioned struc-table in ﬁle systems. The index-hash table consists of ture and techniques. This scheme is constructed onserial number, block number, version number, random collision-resistant hash, bilinear map group, aggrega-integer, and so on. Different from the common index tion algorithm, and homomorphic responses.table, we assure that all records in our index tablediffer from one another to prevent forgery of data 3.1 Notations and Preliminariesblocks and tags. By using this structure, especially Let ℍ = {𝐻 } be a family of hash functions 𝐻 : 𝑘 𝑘the index records {𝜒 𝑖 }, our CPDP scheme can also {0, 1} 𝑛 → {0, 1}∗ index by 𝑘 ∈ 𝒦. We say thatsupport dynamic data operations [8]. http://ieeexploreprojects.blogspot.com algorithm 𝒜 has advantage 𝜖 in breaking collision- The proposed structure can be readily incorperated resistance of ℍ if Pr[𝒜(𝑘) = (𝑚 , 𝑚 ) : 𝑚 ∕= 0 1 0into MAC-based, ECC or RSA schemes [2], [6]. These 𝑚1 , 𝐻 𝑘 (𝑚0 ) = 𝐻 𝑘 (𝑚1 )] ≥ 𝜖, where the probability isschemes, built from collision-resistance signatures (see over the random choices of 𝑘 ∈ 𝒦 and the randomSection 3.1) and the random oracle model, have the bits of 𝒜. So that, we have the following deﬁnition.shortest query and response with public veriﬁability.They share several common characters for the imple- Deﬁnition 3 (Collision-Resistant Hash): A hash fam-mentation of the CPDP framework in the multiple ily ℍ is (𝑡, 𝜖)-collision-resistant if no 𝑡-time adver-clouds: 1) a ﬁle is split into 𝑛×𝑠 sectors and each block sary has advantage at least 𝜖 in breaking collision-(𝑠 sectors) corresponds to a tag, so that the storage of resistance of ℍ.signature tags can be reduced by the increase of 𝑠; We set up our system using bilinear pairings pro-2) a veriﬁer can verify the integrity of ﬁle in random posed by Boneh and Franklin [14]. Let 𝔾 and 𝔾 be 𝑇sampling approach, which is of utmost importance two multiplicative groups using elliptic curve conven-for large ﬁles; 3) these schemes rely on homomorphic tions with a large prime order 𝑝. The function 𝑒 is aproperties to aggregate data and tags into a constant- computable bilinear map 𝑒 : 𝔾 × 𝔾 → 𝔾 with the fol- 𝑇size response, which minimizes the overhead of net- lowing properties: for any 𝐺, 𝐻 ∈ 𝔾 and all 𝑎, 𝑏 ∈ ℤ , 𝑝work communication; and 4) the hierarchy structure we have 1) Bilinearity: 𝑒([𝑎]𝐺, [𝑏]𝐻) = 𝑒(𝐺, 𝐻) 𝑎𝑏 ; 2)provides a virtualization approach to conceal the stor- Non-degeneracy: 𝑒(𝐺, 𝐻) ∕= 1 unless 𝐺 or 𝐻 = 1; andage details of multiple CSPs. 3) Computability: 𝑒(𝐺, 𝐻) is efﬁciently computable.2.4 Homomorphic Veriﬁable Response for CPDP Deﬁnition 4 (Bilinear Map Group System): A bilinear map group system is a tuple 𝕊 = ⟨𝑝, 𝔾, 𝔾 𝑇 , 𝑒⟩ com-A homomorphism is a map 𝑓 : ℙ → ℚ between two posed of the objects as described above.groups such that 𝑓 (𝑔1 ⊕ 𝑔2 ) = 𝑓 (𝑔1 ) ⊗ 𝑓 (𝑔2 ) for all 𝑔1 , 𝑔2 ∈ ℙ, where ⊕ denotes the operation in ℙ and⊗ denotes the operation in ℚ. This notation has been 3.2 Our CPDP Schemeused to deﬁne Homomorphic Veriﬁable Tags (HVTs) In our scheme (see Fig 3), the manager ﬁrst runs algo-in [2]: Given two values 𝜎 𝑖 and 𝜎 𝑗 for two messages rithm 𝐾𝑒𝑦𝐺𝑒𝑛 to obtain the public/private key pairs 𝑚 𝑖 and 𝑚 𝑗 , anyone can combine them into a value for CSPs and users. Then, the clients generate the tags 𝜎 ′ corresponding to the sum of the messages 𝑚 𝑖 + of outsourced data by using 𝑇 𝑎𝑔𝐺𝑒𝑛. Anytime, the 𝑚 𝑗 . When provable data possession is considered as protocol 𝑃 𝑟𝑜𝑜𝑓 is performed by a 5-move interactive
7.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 7 KeyGen(1 𝜅 ): Let 𝕊 = (𝑝, 𝔾, 𝔾 𝑇 , 𝑒) be a bilinear map group system with randomly selected generators 𝑔, ℎ ∈ 𝔾, where 𝔾, 𝔾 𝑇 are two bilinear groups of a large prime order 𝑝, ∣𝑝∣ = 𝑂(𝜅). Makes a hash function 𝐻 𝑘 (⋅) public. For a CSP, chooses a random number 𝑠 ∈ 𝑅 ℤ 𝑝 and computes 𝑆 = 𝑔 𝑠 ∈ 𝔾. Thus, 𝑠𝑘 𝑝 = 𝑠 and 𝑝𝑘 𝑝 = (𝑔, 𝑆). For a user, chooses two random numbers 𝛼, 𝛽 ∈ 𝑅 ℤ 𝑝 and sets 𝑠𝑘 𝑢 = (𝛼, 𝛽) and 𝑝𝑘 𝑢 = (𝑔, ℎ, 𝐻1 = ℎ 𝛼 , 𝐻2 = ℎ 𝛽 ). TagGen(𝑠𝑘, 𝐹, 𝒫): Splits 𝐹 into 𝑛 × 𝑠 sectors {𝑚 𝑖,𝑗 } 𝑖∈[1,𝑛],𝑗∈[1,𝑠] ∈ ℤ 𝑝𝑛×𝑠 . Chooses 𝑠 random 𝜏1 , ⋅ ⋅ ⋅ , 𝜏 𝑠 ∈ ℤ 𝑝 as the secret of this ﬁle and computes 𝑢 𝑖 = 𝑔 𝜏 𝑖 ∈ 𝔾 for 𝑖 ∈ [1, 𝑠]. Constructs the index table 𝜒 = {𝜒 𝑖 } 𝑖=1 and ﬁlls out the 𝑛 record 𝜒 𝑖 in 𝜒 for 𝑖 ∈ [1, 𝑛], then calculates the tag for each block 𝑚 𝑖 as a { (1) (2) 𝜉 ← 𝐻∑ 𝑖=1 𝜏 𝑖 (𝐹 𝑛 ), 𝑠 𝜉𝑘 ← 𝐻 𝜉(1) (𝐶 𝑘 ), (3) (3) ∏𝑠 𝑚 𝜉 𝑖,𝑘 ← 𝐻 𝜉(2) (𝜒 𝑖 ), 𝜎 𝑖,𝑘 ← (𝜉 𝑖,𝑘 ) 𝛼 ⋅ ( 𝑗=1 𝑢 𝑗 𝑖,𝑗 ) 𝛽 , 𝑘 where 𝐹 𝑛 is the ﬁle name and 𝐶 𝑘 is the CSP name of 𝑃 𝑘 ∈ 𝒫. And then stores 𝜓 = (𝑢, 𝜉 (1) , 𝜒) into TTP, and 𝜎 𝑘 = {𝜎 𝑖,𝑗 }∀𝑗=𝑘 to 𝑃 𝑘 ∈ 𝒫, where 𝑢 = (𝑢1 , ⋅ ⋅ ⋅ , 𝑢 𝑠 ). Finally, the data owner saves the secret 𝜁 = (𝜏1 , ⋅ ⋅ ⋅ , 𝜏 𝑠 ). Proof(𝒫, 𝑉 ): This is a 5-move protocol among the Provers (𝒫 = {𝑃 𝑖 } 𝑖∈[1,𝑐] ), an organizer (𝑂), and a Veriﬁer (𝑉 ) with the common input (𝑝𝑘, 𝜓), which is stored in TTP, as follows: 1) Commitment(𝑂 → 𝑉 ): the organizer chooses a random 𝛾 ∈ 𝑅 ℤ 𝑝 and sends 𝐻1 = 𝐻1𝛾 to the veriﬁer; ′ 2) Challenge1(𝑂 ← 𝑉 ): the veriﬁer chooses a set of challenge index-coefﬁcient pairs 𝑄 = {(𝑖, 𝑣 𝑖 )} 𝑖∈𝐼 and sends 𝑄 to the organizer, where 𝐼 is a set of random indexes in [1, 𝑛] and 𝑣 𝑖 is a random integer in ℤ∗ ; 𝑝 3) Challenge2(𝒫 ← 𝑂): the organizer forwards 𝑄 𝑘 = {(𝑖, 𝑣 𝑖 )} 𝑚 𝑖 ∈𝑃 𝑘 ⊆ 𝑄 to each 𝑃 𝑘 in 𝒫; 4) Response1(𝒫 → 𝑂): 𝑃 𝑘 chooses a random 𝑟 𝑘 ∈ ℤ 𝑝 and 𝑠 random 𝜆 𝑗,𝑘 ∈ ℤ 𝑝 for 𝑗 ∈ [1, 𝑠], and calculates a response ∏ ∑ 𝜆 𝜎 ′𝑘 ← 𝑆 𝑟 𝑘 ⋅ 𝜎 𝑖𝑣 𝑖 , 𝜇 𝑗,𝑘 ← 𝜆 𝑗,𝑘 + 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 , 𝜋 𝑗,𝑘 ← 𝑒(𝑢 𝑗 𝑗,𝑘 , 𝐻2 ), (𝑖,𝑣 𝑖 )∈𝑄 𝑘 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 ∏𝑠 𝑟𝑘 where 𝜇 𝑘 = {𝜇 𝑗,𝑘 } 𝑗∈[1,𝑠] and 𝜋 𝑘 = 𝑗=1 𝜋 𝑗,𝑘 . Let 𝜂 𝑘 ← 𝑔 ∈ 𝔾, each 𝑃 𝑘 sends 𝜃 𝑘 = (𝜋 𝑘 , 𝜎 ′𝑘 , 𝜇 𝑘 , 𝜂 𝑘 ) to the organizer; 5) Response2(𝑂 → 𝑉 ): After receiving all responses from {𝑃 𝑖 } 𝑖∈[1,𝑐] , the organizer aggregates {𝜃 𝑘 } 𝑃 𝑘 ∈𝒫 into a ﬁnal response 𝜃 as ∏ ′ ∑ ∏ 𝜎′ ← ( 𝜎 𝑘 ⋅ 𝜂 −𝑠 ) 𝛾 , 𝜇′𝑗 ← 𝑘 𝛾 ⋅ 𝜇 𝑗,𝑘 , 𝜋 ′ ← ( 𝜋 𝑘) 𝛾 . (1) 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 ′ Let 𝜇 = {𝜇′𝑗 } 𝑗∈[1,𝑠] . The organizer sends 𝜃 = (𝜋 , 𝜎 , 𝜇 ) to the veriﬁer.′ ′ ′ http://ieeexploreprojects.blogspot.com Veriﬁcation: Now the veriﬁer can check whether the response was correctly formed by checking that ? ∏ ∏ 𝑠 𝜇′ 𝜋 ′ ⋅ 𝑒(𝜎 ′ , ℎ) = 𝑒( 𝐻 𝜉(2) (𝜒 𝑖 ) 𝑣 𝑖 , 𝐻1 ) ⋅ 𝑒( ′ 𝑢 𝑗 𝑗 , 𝐻2 ). (2) 𝑘 (𝑖,𝑣 𝑖 )∈𝑄 𝑗=1 a. For 𝜒 𝑖 = “𝐵 𝑖 , 𝑉 𝑖 , 𝑅 𝑖 ” in Section 2.3, we can set 𝜒 𝑖 = (𝐵 𝑖 = 𝑖, 𝑉 𝑖 = 1, 𝑅 𝑖 ∈ 𝑅 {0, 1}∗ ) at initial stage of CPDP scheme.Fig. 3. Cooperative Provable Data Possession for Multi-Cloud Storage.proof protocol between a veriﬁer and more than one value of 𝛾. Therefore, our approach guarantees onlyCSP, in which CSPs need not to interact with each the organizer can compute the ﬁnal 𝜎 ′ by using 𝛾 andother during the veriﬁcation process, but an organizer 𝜎 ′𝑘 received from CSPs.is used to organize and manage all CSPs. After 𝜎 ′ is computed, we need to transfer it This protocol can be described as follows: 1) the or- to the organizer in stage of “Response1”. In orderganizer initiates the protocol and sends a commitment to ensure the security of transmission of data tags,to the veriﬁer; 2) the veriﬁer returns a challenge set of our scheme employs a new method, similar to therandom index-coefﬁcient pairs 𝑄 to the organizer; 3) ElGamal encryption, to encrypt the combination of ∏ 𝑣𝑖the organizer relays them into each 𝑃 𝑖 in 𝒫 according tags (𝑖,𝑣 𝑖 )∈𝑄 𝑘 𝜎 𝑖 , that is, for 𝑠𝑘 = 𝑠 ∈ ℤ 𝑝 andto the exact position of each data block; 4) each 𝑃 𝑖 𝑝𝑘 = (𝑔, 𝑆 = 𝑔 𝑠 ) ∈ 𝔾2 , the cipher of message 𝑚returns its response of challenge to the organizer; is 𝒞 = (𝒞1 = 𝑔 𝑟 , 𝒞2 = 𝑚 ⋅ 𝑆 𝑟 ) and its decryption is −𝑠and 5) the organizer synthesizes a ﬁnal response performed by 𝑚 = 𝒞2 ⋅𝒞1 . Thus, we hold the equationfrom received responses and sends it to the veriﬁer.The above process would guarantee that the veriﬁer ⎛ ⎞𝛾 ⎛ ∏ ⎞𝛾 ∏ 𝜎 ′𝑘 ⎠ ∏ 𝑆 𝑟 𝑘 ⋅ (𝑖,𝑣 )∈𝑄 𝜎 𝑖𝑣 𝑖accesses ﬁles without knowing on which CSPs or in 𝜎 ′ = ⎝ =⎝ 𝑖 𝑘 ⎠ 𝑠 𝑠what geographical locations their ﬁles reside. 𝑃 𝑘 ∈𝒫 𝜂𝑘 𝑃 𝑘 ∈𝒫 𝜂𝑘 ⎛ ⎞𝛾 In contrast to a single CSP environment, our scheme ∏ ∏ ∏differs from the common PDP scheme in two aspects: = ⎝ ⋅ 𝜎𝑖𝑣 𝑖⎠ = 𝜎 𝑖𝑣 𝑖 ⋅𝛾 . 1) Tag aggregation algorithm: In stage of commit- 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 (𝑖,𝑣 𝑖 )∈𝑄ment, the organizer generates a random 𝛾 ∈ 𝑅 ℤ 𝑝 ′and returns its commitment 𝐻1 to the veriﬁer. This 2) Homomorphic responses: Because of the homo-assures that the veriﬁer and CSPs do not obtain the morphic property, the responses computed from CSPs
8.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 8in a multi-cloud can be combined into a single ﬁnal 4.1 Collision resistant for index-hash hierarchyresponse as follows: given a∑ of 𝜃 𝑘 = (𝜋 𝑘 , 𝜎 ′𝑘 , 𝜇 𝑘 , 𝜂 𝑘 ) set In our CPDP scheme, the collision resistant of index-received from 𝑃 𝑘 , let 𝜆 𝑗 = 𝑃 𝑘 ∈𝒫 𝜆 𝑗,𝑘 , the organizer hash hierarchy is the basis and prerequisite for thecan compute security of whole scheme, which is described as being ⎛ ⎞ ∑ ∑ ∑ secure in the random oracle model. Although the hash𝜇′𝑗 = 𝛾 ⋅ 𝜇 𝑗,𝑘 = 𝛾 ⋅ ⎝ 𝜆 𝑗,𝑘 + 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 ⎠ function is collision resistant, a successful hash colli- 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 ∑ ∑ ∑ sion can still be used to produce a forged tag when = 𝛾 ⋅ 𝜆 𝑗,𝑘 + 𝛾 ⋅ 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 the same hash value is reused multiple times, e.g., a 𝑃 ∈𝒫 𝑘 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 ∑ ∑ legitimate client modiﬁes the data or repeats to insert = 𝛾⋅ 𝜆 𝑗,𝑘 + 𝛾 ⋅ 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 and delete data blocks of outsourced data. To avoid ∑ (3) = 𝛾 ⋅ 𝜆𝑗 + 𝛾 ⋅ 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 . the hash collision, the hash value 𝜉 𝑖,𝑘 , which is used (𝑖,𝑣 𝑖 )∈𝑄 to generate the tag 𝜎 𝑖 in CPDP scheme, is computed from the set of values {𝜏 𝑖 }, 𝐹 𝑛 , 𝐶 𝑘 , {𝜒 𝑖 }. As long as The commitment of 𝜆 𝑗 is also computed by ∏ ∏ ∏𝑠 there exists one bit difference in these data, we can 𝜋′ = ( 𝜋 𝑘) 𝛾 = ( 𝜋 𝑗,𝑘 ) 𝛾 avoid the hash collision. As a consequence, we have 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 𝑗=1 ∏𝑠 ∏ 𝜆 the following theorem (see Appendix B): = 𝑒(𝑢 𝑗 𝑗,𝑘 , 𝐻2 ) 𝛾 𝑗=1 𝑃 𝑘 ∈𝒫 ∏𝑠 ∑ ∏𝑠 Theorem 1 (Collision Resistant): The index-hash hier- 𝑃 ∈𝒫 𝜆 𝑗,𝑘 𝜆 = 𝑒(𝑢 𝑗 𝑘 , 𝐻2𝛾 ) = ′ 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ). archy in CPDP scheme is collision resistant, even if √ 𝑗=1 𝑗=1 1 the client generates 2𝑝 ⋅ ln 1−𝜀 ﬁles with the same It is obvious that the ﬁnal response 𝜃 received by ﬁle name and cloud name, and the client repeats √the veriﬁers from multiple CSPs is same as that in one 1simple CSP. This means that our CPDP scheme is able 2 𝐿+1 ⋅ ln 1−𝜀 times to modify, insert and delete datato provide a transparent veriﬁcation for the veriﬁers. blocks, where the collision probability is at least 𝜀,Two response algorithms, Response1 and Response2, 𝜏 𝑖 ∈ ℤ 𝑝 , and ∣𝑅 𝑖 ∣ = 𝐿 for 𝑅 𝑖 ∈ 𝜒 𝑖 .comprise an HVR: Given two responses 𝜃 𝑖 and 𝜃 𝑗for two challenges 𝑄 𝑖 and 𝑄 𝑗 from two CSPs, i.e., 4.2 Completeness property of veriﬁcation 𝜃 𝑖 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒1(𝑄 𝑖, {𝑚 𝑘 } 𝑘∈𝐼 𝑖 , {𝜎 𝑘 } 𝑘∈𝐼 𝑖 ), there exists In our scheme, the completeness property implies http://ieeexploreprojects.blogspot.coman efﬁcient algorithm to combine them into a ﬁnal public veriﬁability property, which allows anyone, notresponse 𝜃 corresponding to the sum of the challenges just the client (data owner), to challenge the cloud ∪ 𝑄𝑖 𝑄 𝑗 , that is, server for data integrity and data ownership without ( ∪ ) 𝜃 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒1 𝑄𝑖 𝑄 𝑗 , {𝑚 𝑘 } 𝑘∈𝐼 𝑖 ∪ 𝐼 𝑗 , {𝜎 𝑘 } 𝑘∈𝐼 𝑖 ∪ 𝐼𝑗 the need for any secret information. First, for every = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒2(𝜃 𝑖 , 𝜃 𝑗 ). available data-tag pair (𝐹, 𝜎) ∈ 𝑇 𝑎𝑔𝐺𝑒𝑛(𝑠𝑘, 𝐹 ) and a random challenge 𝑄 = (𝑖, 𝑣 𝑖 ) 𝑖∈𝐼 , the veriﬁcation For multiple CSPs, the above equation can be ex- protocol should be completed with success probabilitytended to 𝜃 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒2({𝜃 𝑘 } 𝑃 𝑘 ∈𝒫 ). More importantly, according to the Equation (3), that is,the HVR is a pair of values 𝜃 = (𝜋, 𝜎, 𝜇), which has aconstant-size even for different challenges. ⎡〈 〉 ⎤ ∑ Pr ⎣ 𝑃 𝑘 (𝐹 , 𝜎 ) ↔ 𝑂 ↔ 𝑉 (𝑝𝑘, 𝜓) = 1⎦ = 1. (𝑘) (𝑘)4 S ECURITY A NALYSIS 𝑃 𝑘 ∈𝒫We give a brief security analysis of our CPDP In this process, anyone can obtain the owner’sconstruction. This construction is directly derived public key 𝑝𝑘 = (𝑔, ℎ, 𝐻1 = ℎ 𝛼 , 𝐻2 = ℎ 𝛽 ) and thefrom multi-prover zero-knowledge proof system (MP- corresponding ﬁle parameter 𝜓 = (𝑢, 𝜉 (1) , 𝜒) fromZKPS), which satisﬁes following properties for a given TTP to execute the veriﬁcation protocol, hence thisassertion 𝐿: is a public veriﬁable protocol. Moreover, for different 1) Completeness: whenever 𝑥 ∈ 𝐿, there exists a owners, the secrets 𝛼 and 𝛽 hidden in their publicstrategy for the provers that convinces the veriﬁer that key 𝑝𝑘 are also different, determining that a successthis is the case; veriﬁcation can only be implemented by the real 2) Soundness: whenever 𝑥 ∕∈ 𝐿, whatever strategy owner’s public key. In addition, the parameter 𝜓 isthe provers employ, they will not convince the veriﬁer used to store the ﬁle-related information, so an ownerthat 𝑥 ∈ 𝐿; can employ a unique public key to deal with a large 3) Zero-knowledge: no cheating veriﬁer can learn number of outsourced ﬁles.anything other than the veracity of the statement. According to existing IPS research [15], these prop-erties can protect our construction from various at- 4.3 Zero-knowledge property of veriﬁcationtacks, such as data leakage attack (privacy leakage), The CPDP construction is in essence a Multi-Provertag forgery attack (ownership cheating), etc. In details, Zero-knowledge Proof (MP-ZKP) system [11], whichthe security of our scheme can be analyzed as follows: can be considered as an extension of the notion of
9.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 9 ∏𝑠 𝜆 ∏ 𝜋 ′ ⋅ 𝑒(𝜎 ′ , ℎ) = ′ 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ) ⋅ 𝑒( 𝜎 𝑖𝑣 𝑖 ⋅𝛾 , ℎ) 𝑗=1 (𝑖,𝑣 𝑖 )∈𝑄 ∏𝑠 𝜆 ′ ∏ (3) ∏𝑠 𝑚 𝑖,𝑗 = 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ) ⋅ 𝑒( ((𝜉 𝑖,𝑘 ) 𝛼 ⋅ ( 𝑢𝑗 ) 𝛽 ) 𝑣 𝑖 ⋅𝛾 , ℎ) 𝑗=1 𝑗=1 (𝑖,𝑣 𝑖 )∈𝑄 ∑ 𝛾𝑚 𝑖,𝑗 𝑣 𝑖 ∏ 𝑠 𝛾⋅𝜆 ∏ (3) ∏𝑠 (𝑖,𝑣 𝑖 )∈𝑄 = 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ) ⋅ 𝑒( (𝜉 𝑖 ) 𝑣 𝑖 , ℎ) 𝛼𝛾 ⋅ 𝑒( 𝑢𝑗 , ℎ 𝛽) 𝑗=1 𝑗=1 (𝑖,𝑣 𝑖 )∈𝑄 ∏ (3) ∏ 𝑠 𝜇′ = 𝑒( (𝜉 𝑖 ) 𝑣 𝑖 , 𝐻1 ) ⋅ ′ 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ). (3) (𝑖,𝑣 𝑖 )∈𝑄 𝑗=1an interactive proof system (IPS). Roughly speak- using reduction to absurdity 1 : we make use of 𝒫 ∗ing, in the scenario of MP-ZKP, a polynomial-time to construct a knowledge extractor ℳ [7,13], whichbounded veriﬁer interacts with several provers whose gets the common input (𝑝𝑘, 𝜓) and rewindable black-computational powers are unlimited. According to a box accesses to the prover 𝑃 ∗ , and then attempts toSimulator model, in which every cheating veriﬁer has break the computational Difﬁe-Hellman (CDH) prob-a simulator that can produce a transcript that “looks lem in 𝔾: given 𝐺, 𝐺1 = 𝐺 𝑎 , 𝐺2 = 𝐺 𝑏 ∈ 𝑅 𝔾, outputlike” an interaction between a honest prover and a 𝐺 𝑎𝑏 ∈ 𝔾. But it is unacceptable because the CDH prob-cheating veriﬁer, we can prove our CPDP construction lem is widely regarded as an unsolved problem inhas Zero-knowledge property (see Appendix C): polynomial-time. Thus, the opposite direction of the theorem also follows. We have the following theorem Theorem 2 (Zero-Knowledge Property): The veriﬁcat- (see Appendix D):ion protocol 𝑃 𝑟𝑜𝑜𝑓 (𝒫, 𝑉 ) in CPDP scheme is a com-putational zero-knowledge system under a simulator Theorem 3 (Knowledge Soundness Property): Our sch-model, that is, for every probabilistic polynomial-time eme has (𝑡, 𝜖′ ) knowledge soundness in random oracleinteractive machine 𝑉 ∗ , there exists a probabilistic and rewindable knowledge extractor model assum-polynomial-time algorithm 𝑆 ∗ such that the ensem- ing the (𝑡, 𝜖)-computational Difﬁe-Hellman (CDH) as- ∑bles 𝑉 𝑖𝑒𝑤(⟨ 𝑃 𝑘 ∈𝒫 𝑃 𝑘 (𝐹 (𝑘) , 𝜎 (𝑘) ) ↔ 𝑂 ↔ 𝑉 ∗ ⟩(𝑝𝑘, 𝜓)) sumption holds in the group 𝔾 for 𝜖′ ≥ 𝜖.and 𝑆 ∗ (𝑝𝑘, 𝜓) are computationally indistinguishable. http://ieeexploreprojects.blogspot.com Essentially, the soundness means that it is infeasible Zero-knowledge is a property that achieves the to fool the veriﬁer to accept false statements. Often,CSPs’ robustness against attempts to gain knowledge the soundness can also be regarded as a stricter notionby interacting with them. For our construction, we of unforgeability for ﬁle tags to avoid cheating themake use of the zero-knowledge property to preserve ownership. This means that the CSPs, even if collusionthe privacy of data blocks and signature tags. Firstly, is attempted, cannot be tampered with the data orrandomness is adopted into the CSPs’ responses in forge the data tags if the soundness property holds.order to resist the data leakage attacks (see Attacks 1 Thus, the Theorem 3 denotes that the CPDP schemeand 3 in Appendix A). That is, the random integer can resist the tag forgery attacks (see Attacks 2 and 4 in 𝜆 𝑗,𝑘 is∑introduced into the response 𝜇 𝑗,𝑘 , i.e., 𝜇 𝑗,𝑘 = Appendix A) to avoid cheating the CSPs’ ownership. 𝜆 𝑗,𝑘 + (𝑖,𝑣 𝑖 )∈𝑄 𝑘 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 . This means that the cheatingveriﬁer cannot obtain 𝑚 𝑖,𝑗 from 𝜇 𝑗,𝑘 because he does 5 P ERFORMANCE E VALUATIONnot know the random integer 𝜆 𝑗,𝑘 . At the same time, In this section, to detect abnormality in a low-a random integer 𝛾 is also introduced to randomize overhead and timely manner, we analyze and op- ∏the veriﬁcation tag 𝜎, i.e., 𝜎 ′ ← ( 𝑃 𝑘 ∈𝒫 𝜎 ′𝑘 ⋅ 𝑅−𝑠 ) 𝛾 . timize the performance of CPDP scheme based on 𝑘Thus, the tag 𝜎 cannot reveal to the cheating veriﬁer the above scheme from two aspects: evaluation ofin terms of randomness. probabilistic queries and optimization of length of blocks. To validate the effects of scheme, we introduce4.4 Knowledge soundness of veriﬁcation a prototype of CPDP-based audit system and present the experimental results.For every data-tag pairs (𝐹 ∗ , 𝜎 ∗ ) ∕∈ 𝑇 𝑎𝑔𝐺𝑒𝑛(𝑠𝑘, 𝐹 ), inorder to prove nonexistence of fraudulent 𝒫 ∗ and 𝑂∗ ,we require that the scheme satisﬁes the knowledge 5.1 Performance Analysis for CPDP Schemesoundness property, that is, We present the computation cost of our CPDP scheme ⎡〈 〉 ⎤ in Table 3. We use [𝐸] to denote the computation cost ∑ of an exponent operation in 𝔾, namely, 𝑔 𝑥 , where 𝑥Pr ⎣ 𝑃 𝑘 (𝐹 (𝑘)∗ , 𝜎 (𝑘)∗ ∗ )↔ 𝑂 ↔ 𝑉 (𝑝𝑘, 𝜓) = 1⎦ ≤ 𝜖, is a positive integer in ℤ 𝑝 and 𝑔 ∈ 𝔾 or 𝔾 𝑇 . We ne- 𝑃 𝑘 ∈𝒫 ∗ glect the computation cost of algebraic operations andwhere 𝜖 is a negligible error. We prove that our 1. It is a proof method in which a proposition is proved to bescheme has the knowledge soundness property by true by proving that it is impossible to be false.
10.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 10simple modular arithmetic operations because they 𝑠 means much lower storage. Furthermore, in therun fast enough [16]. The most complex operation is veriﬁcation protocol, the communication overhead ofthe computation of a bilinear map 𝑒(⋅, ⋅) between two challenge is 2𝑡⋅𝑙0 = 40 ⋅𝑡-Bytes in terms of the numberelliptic points (denoted as [𝐵]). of challenged blocks 𝑡, but its response (response1 or response2) has a constant-size communication over- TABLE 3 head 𝑠 ⋅ 𝑙0 + 𝑙1 + 𝑙 𝑇 ≈ 1.3𝐾-bytes for different ﬁle sizes. Comparison of computation overheads between our Also, it implies that client’s communication overheads CPDP scheme and non-cooperative (trivial) scheme. are of a ﬁxed size, which is entirely irrelevant for the CPDP Scheme Trivial Scheme number of CSPs. KeyGen 3[𝐸] 2[E] TagGen (2𝑛 + 𝑠)[𝐸] (2𝑛 + 𝑠)[𝐸] 5.2 Probabilistic Veriﬁcation Proof(𝒫) 𝑐[𝐵] + (𝑡 + 𝑐𝑠 + 1)[𝐸] 𝑐[𝐵] + (𝑡 + 𝑐𝑠 − 𝑐)[𝐸] Proof(V) 3[𝐵] + (𝑡 + 𝑠)[𝐸] 3𝑐[𝐵] + (𝑡 + 𝑐𝑠)[𝐸] We recall the probabilistic veriﬁcation of common PDP scheme (which only involves one CSP), in which Then, we analyze the storage and communication the veriﬁcation process achieves the detection of CSPcosts of our scheme. We deﬁne the bilinear pairing server misbehavior in a random sampling mode intakes the form 𝑒 : 𝐸(𝔽 𝑝 𝑚 ) × 𝐸(𝔽 𝑝 𝑘𝑚 ) → 𝔽∗ 𝑘𝑚 (The 𝑝 order to reduce the workload on the server. Thedeﬁnition given here is from [17], [18]), where 𝑝 is a detection probability of disrupted blocks 𝑃 is anprime, 𝑚 is a positive integer, and 𝑘 is the embedding important parameter to guarantee that these blocksdegree (or security multiplier). In this case, we utilize can be detected in time. Assume the CSP modiﬁes 𝑒an asymmetric pairing 𝑒 : 𝔾1 ×𝔾2 → 𝔾 𝑇 to replace the blocks out of the 𝑛-block ﬁle, that is, the probability 𝑒symmetric pairing in the original schemes. In Table 3, of disrupted blocks is 𝜌 𝑏 = 𝑛 . Let 𝑡 be the numberit is easy to ﬁnd that client’s computation overheads of queried blocks for a challenge in the veriﬁcationare entirely irrelevant for the number of CSPs. Further, protocol. We have detection probability 2our scheme has better performance compared with 𝑛− 𝑒 𝑡non-cooperative approach due to the total of compu- 𝑃 (𝜌 𝑏 , 𝑡) ≥ 1 − ( ) = 1 − (1 − 𝜌 𝑏 ) 𝑡 , 𝑛tation overheads decrease 3(𝑐 − 1) times bilinear map where, 𝑃 (𝜌 , 𝑡) denotes that the probability 𝑃 is a 𝑏operations, where 𝑐 is the number of clouds in a multi- function over 𝜌 and 𝑡. Hence, the number of queried 𝑏 http://ieeexploreprojects.blogspot.com 𝑃 ⋅𝑛cloud. The reason is that, before the responses are blocks is 𝑡 ≈ log(1−𝜌 𝑏) ≈ 𝑒 for a sufﬁciently large 𝑛 log(1−𝑃 )sent to the veriﬁer from 𝑐 clouds, the organizer hasaggregate these responses into a response by using and 𝑡 ≪ 𝑛.3 This means that the number of queriedaggregation algorithm, so the veriﬁer only need to blocks 𝑡 is directly proportional to the total numberverify this response once to obtain the ﬁnal result. of ﬁle blocks 𝑛 for the constant 𝑃 and 𝑒. Therefore, for a uniform random veriﬁcation in a PDP scheme TABLE 4 with fragment structure, given a ﬁle with 𝑠𝑧 = 𝑛 ⋅ 𝑠 Comparison of communication overheads between sectors and the probability of sector corruption 𝜌, our CPDP and non-cooperative (trivial) scheme. the detection probability of veriﬁcation protocol has 𝑃 ≥ 1 − (1 − 𝜌) 𝑠𝑧⋅𝜔 , where 𝜔 denotes the sampling CPDP Scheme Trivial Scheme probability in the veriﬁcation protocol. We can obtain Commitment 𝑙2 𝑐𝑙2 Challenge1 2𝑡𝑙0 this result as follows: because 𝜌 𝑏 ≥ 1 − (1 − 𝜌) 𝑠 is 2𝑡𝑙0 the probability of block corruption with 𝑠 sectors in Challenge2 2𝑡𝑙0 /𝑐 Response1 𝑠𝑙0 + 2𝑙1 + 𝑙 𝑇 common PDP scheme, the veriﬁer can detect block (𝑠𝑙0 + 𝑙1 + 𝑙 𝑇 )𝑐 Response2 𝑠𝑙0 + 𝑙1 + 𝑙 𝑇 errors with probability 𝑃 ≥ 1 − (1 − 𝜌 𝑏 ) 𝑡 ≥ 1 − ((1 − 𝜌) 𝑠 ) 𝑛⋅𝜔 = 1 − (1 − 𝜌) 𝑠𝑧⋅𝜔 for a challenge with Without loss of generality, let the security param- 𝑡 = 𝑛⋅𝜔 index-coefﬁcient pairs. In the same way, giveneter 𝜅 be 80 bits, we need the elliptic curve domain a multi-cloud 𝒫 = {𝑃 } 𝑖 𝑖∈[1,𝑐] , the detection probabilityparameters over 𝔽 𝑝 with ∣𝑝∣ = 160 bits and 𝑚 = 1 of CPDP scheme hasin our experiments. This means that the length ofinteger is 𝑙0 = 2𝜅 in ℤ 𝑝 . Similarly, we have 𝑙1 = 4𝜅 𝑃 (𝑠𝑧, {𝜌 𝑘 , 𝑟 𝑘 } 𝑃 𝑘 ∈𝒫 , 𝜔) ∏in 𝔾1 , 𝑙2 = 24𝜅 in 𝔾2 , and 𝑙 𝑇 = 24𝜅 in 𝔾 𝕋 for the ≥ 1− ((1 − 𝜌 𝑘 ) 𝑠 ) 𝑛⋅𝑟 𝑘 ⋅𝜔 𝑃 𝑘 ∈𝒫embedding degree 𝑘 = 6. The storage and communi- ∏cation costs of our scheme is shown in Table 4. The = 1− (1 − 𝜌 𝑘 ) 𝑠𝑧⋅𝑟 𝑘 ⋅𝜔 , 𝑃 𝑘 ∈𝒫storage overhead of a ﬁle with 𝑠𝑖𝑧𝑒(𝑓 ) = 1𝑀 -bytes is 𝑠𝑡𝑜𝑟𝑒(𝑓 ) = 𝑛 ⋅ 𝑠 ⋅ 𝑙0 + 𝑛 ⋅ 𝑙1 = 1.04𝑀 -bytes for 𝑛 = 103 where 𝑟 𝑘 denotes the proportion of data blocks in theand 𝑠 = 50. The storage overhead of its index table 𝜒 𝑘-th CSP, 𝜌 𝑘 denotes the probability of ﬁle corruptionis 𝑛 ⋅ 𝑙0 = 20𝐾-bytes. We deﬁne the overhead rate as 𝑒 𝑒 𝑒 2. Exactly, we have 𝑃 = 1 − (1 − 𝑛 ) ⋅ (1 − 𝑛−1 ) ⋅ ⋅ ⋅ (1 − 𝑛−𝑡+1 ). 𝜆 = 𝑠𝑡𝑜𝑟𝑒(𝑓)) − 1 = 𝑠⋅𝑙0 and it should therefore be kept Since 1 − 𝑒 ≥ 1 − 𝑒 for 𝑖 ∈ [0, 𝑡 − 1], we have 𝑃 = 1 − ∏ 𝑖=0 (1 − 𝑠𝑖𝑧𝑒(𝑓 𝑙1 𝑡−1 𝑛 ∏ 𝑛−𝑖 𝑒 𝑡−1as low as possible in order to minimize the storage in 𝑛−𝑖 ) ≥ 1 − 𝑖=0 (1 − 𝑒 ) = 1 − (1 − 𝑒 ) 𝑡 . 𝑛 𝑛cloud storage providers. It is obvious that a higher 3. In terms of (1− 𝑛 ) 𝑡 ≈ 1− 𝑒⋅𝑡 , we have 𝑃 ≈ 1−(1− 𝑒⋅𝑡 ) = 𝑒⋅𝑡 . 𝑒 𝑛 𝑛 𝑛
11.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 11 TABLE 5 The inﬂuence of 𝑠, 𝑡 under the different corruption probabilities 𝜌 and the different detection probabilities 𝑃 . 𝒫 {0.1,0.2,0.01} {0.01,0.02,0.001} {0.001,0.002,0.0001} {0.0001,0.0002,0.00001} 𝑟 {0.5,0.3,0.2} {0.5,0.3,0.2} {0.5,0.3,0.2} {0.5,0.3,0.2} 0.8 3/4 7/20 23/62 71/202 0.85 3/5 8/21 26/65 79/214 0.9 3/6 10/20 28/73 87/236 0.95 3/8 11/29 31/86 100/267 0.99 4/10 13/31 39/105 119/345 0.999 5/11 16/38 48/128 146/433in the 𝑘-th CSP, and 𝑟 𝑘 ⋅𝜔 denotes the possible number 5.3 Parameter Optimizationof blocks queried by the veriﬁer in the 𝑘-th CSP. In the fragment structure, the number of sectorsFurthermore, we observe the ratio of queried blocks per block 𝑠 is an important parameter to affect thein the total ﬁle blocks 𝑤 under different detection performance of storage services and audit services.probabilities. Based on above analysis, it is easy to Hence, we propose an optimization algorithm for theﬁnd that this ratio holds the equation value of s in this section. Our results show that the log(1 − 𝑃 ) optimal value can not only minimize the computation 𝑤≈ ∑ . and communication overheads, but also reduce the 𝑠𝑧 ⋅ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅ log(1 − 𝜌 𝑘 ) size of extra storage, which is required to store the When this probability 𝜌 𝑘 is a constant probability, veriﬁcation tags in CSPs.the veriﬁer can detect sever misbehavior with a cer- Assume 𝜌 denotes the probability of sector corrup-tain probability 𝑃 by asking proof for the number of tion. In the fragment structure, the choosing of 𝑠 is ex-blocks 𝑡 ≈ log(1−𝑃 ) for PDP or ˙ 𝑠log(1−𝜌) tremely important for improving the performance of the CPDP scheme. Given the detection probability 𝑃 log(1 − 𝑃 ) and the probability of sector corruption 𝜌 for multiple 𝑡≈ ∑ 𝑠⋅ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅ log(1 − 𝜌 𝑘 ) clouds 𝒫 = {𝑃 𝑘 }, the optimal value of 𝑠 can be com- { } 𝑠𝑧⋅𝑤 puted by min 𝑠∈ℕ ∑ log(1−𝑃 ) 𝑎 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) ⋅ 𝑠 + 𝑏 ⋅ 𝑠 + 𝑐 ,for CPDP, where 𝑡 = 𝑛 ⋅ 𝑤 = 𝑠 .http://ieeexploreprojects.blogspot.com 𝑃 𝑘 ∈𝒫 Note that, the valueof 𝑡 is dependent on the total number of ﬁle blocks where 𝑎 ⋅ 𝑡 + 𝑏 ⋅ 𝑠 + 𝑐 denotes the computational cost 𝑛 [2], because it is increased along with the decrease of veriﬁcation protocol in PDP scheme, 𝑎, 𝑏, 𝑐 ∈ ℝ,of 𝜌 𝑘 and log(1 − 𝜌 𝑘 ) < 0 for the constant number of and 𝑐 is a constant. This conclusion can be obtaineddisrupted blocks 𝑒 and the larger number 𝑛. from following process: Let 𝑠𝑧 = 𝑛 ⋅ 𝑠 = 𝑠𝑖𝑧𝑒(𝑓 )/𝑙0. According to above-mentioned results, the sam- pling probability holds 𝑤 ≥ 𝑠𝑧⋅∑ log(1−𝑃 ) 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) = 300 𝑃 𝑘 ∈𝒫 ∑ log(1−𝑃 ) . In order to minimize the com- 0.800 𝑛⋅𝑠⋅ 𝑃 ∈𝒫 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) 250 0.850 𝑘 0.900 putational cost, we have 0.950 Computational Compexity 200 0.990 0.999 min {𝑎 ⋅ 𝑡 + 𝑏 ⋅ 𝑠 + 𝑐} 𝑠∈ℕ 150 = min {𝑎 ⋅ 𝑛 ⋅ 𝑤 + 𝑏 ⋅ 𝑠 + 𝑐} 𝑠∈ℕ 100 { } log(1 − 𝑃 ) 𝑎 ≥ min ∑ + 𝑏⋅ 𝑠+ 𝑐 . 𝑠∈ℕ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅ log(1 − 𝜌 𝑘 ) 𝑠 50 0 0 10 20 30 40 50 where 𝑟 𝑘 denotes the proportion of data blocks in the The number of sectors in each block 𝑘-th CSP, 𝜌 𝑘 denotes the probability of ﬁle corruptionFig. 4. The relationship between computational cost in the 𝑘-th CSP. Since 𝑠𝑎 is a monotone decreasingand the number of sectors in each block. function and 𝑏 ⋅ 𝑠 is a monotone increasing function for 𝑠 > 0, there exists an optimal value of 𝑠 ∈ ℕ in the Another advantage of probabilistic veriﬁcation above equation. The optimal value of 𝑠 is unrelatedbased on random sampling is that it is easy to identify to a certain ﬁle from this conclusion if the probabilitythe tampering or forging data blocks or tags. The iden- 𝜌 is a constant value.tiﬁcation function is obvious: when the veriﬁcation For instance, we assume a multi-cloud storagefails, we can choose the partial set of challenge in- involves three CSPs 𝒫 = {𝑃1 , 𝑃2 , 𝑃3 } and thedexes as a new challenge set, and continue to execute probability of sector corruption is a constant valuethe veriﬁcation protocol. The above search process can {𝜌1 , 𝜌2 , 𝜌3 } = {0.01, 0.02, 0.001}. We set the detectionbe repeatedly executed until the bad block is found. probability 𝑃 with the range from 0.8 to 1, e.g.,The complexity of such a search process is 𝑂(log 𝑛). 𝑃 = {0.8, 0.85, 0.9, 0.95, 0.99, 0.999}. For a ﬁle, the
12.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 12Fig. 5. Applying CPDP scheme in Hadoop distributed ﬁle system (HDFS).proportion of data blocks is 50%, 30%, and 20% in 5.4 CPDP for Integrity Audit Servicesthree CSPs, respectively, that is, 𝑟1 = 0.5, 𝑟2 = 0.3, and Based on our CPDP scheme, we introduce an audit 𝑟3 = 0.2. In terms of Table 3, the computational cost system architecture for outsourced data in multipleof CSPs can be simpliﬁed to 𝑡 + 3𝑠 + 9. Then, we can clouds by replacing the TTP with a third party auditorobserve the computational cost under different 𝑠 and (TPA) in Figure 1. In this architecture, this architecture 𝑃 in Figure 4. When 𝑠 is less than the optimal value, can be constructed into a visualization infrastructurethe computational cost decreases evidently with the of cloud-based storage service [1]. In Figure 5, weincrease of 𝑠, and then it raises when 𝑠 is more than show an example of applying our CPDP scheme inthe optimal value. Hadoop distributed ﬁle system (HDFS) 4 , which a distributed, scalable, and portable ﬁle system [19]. TABLE 6 HDFS’ architecture is composed of NameNode and http://ieeexploreprojects.blogspot.com The inﬂuence of parameters under different detection DataNode, where NameNode maps a ﬁle name to probabilities 𝑃 (𝒫 = {𝜌1 , 𝜌2 , 𝜌3 } = {0.01, 0.02, 0.001}, a set of indexes of blocks and DataNode indeed {𝑟1 , 𝑟2 , 𝑟3 } = {0.5, 0.3, 0.2}). stores data blocks. To support our CPDP scheme, the index-hash hierarchy and the metadata of NameNode P 0.8 0.85 0.9 0.95 0.99 0.999 should be integrated together to provide an enquiry 𝑠𝑧 ⋅ 𝑤 142.60 168.09 204.02 265.43 408.04 612.06 (3) 𝑠 7 8 10 11 13 16 service for the hash value 𝜉 𝑖,𝑘 or index-hash record 𝜒 𝑖 . 𝑡 20 21 20 29 31 38 Based on the hash value, the clients can implement the veriﬁcation protocol via CPDP services. Hence, it is easy to replace the checksum methods with the CPDP More accurately, we show the inﬂuence of parame- scheme for anomaly detection in current HDFS.ters, 𝑠𝑧 ⋅𝑤, 𝑠, and 𝑡, under different detection probabil- To validate the effectiveness and efﬁciency of ourities in Table 6. It is easy to see that computational cost proposed approach for audit services, we have imple-raises with the increase of 𝑃 . Moreover, we can make mented a prototype of an audit system. We simulatedsure the sampling number of challenge with following the audit service and the storage service by using twoconclusion: Given the detection probability 𝑃 , the local IBM servers with two Intel Core 2 processors atprobability of sector corruption 𝜌, and the number 2.16 GHz and 500M RAM running Windows Serverof sectors in each block 𝑠, the sampling number of 2003. These servers were connected via 250 MB/sec ofveriﬁcation protocol are a constant 𝑡 = 𝑛 ⋅ 𝑤 ≥ network bandwidth. Using GMP and PBC libraries, ∑ log(1−𝑃 ) for different ﬁles. 𝑠⋅ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) we have implemented a cryptographic library upon Finally, we observe the change of 𝑠 under different which our scheme can be constructed. This C library 𝜌 and 𝑃 . The experimental results are shown in Table contains approximately 5,200 lines of codes and has5. It is obvious that the optimal value of 𝑠 raises with been tested on both Windows and Linux platforms.increase of 𝑃 and with the decrease of 𝜌. We choose The elliptic curve utilized in the experiment is athe optimal value of 𝑠 on the basis of practical settings MNT curve, with base ﬁeld size of 160 bits and theand system requisition. For NTFS format, we suggest embedding degree 6. The security level is chosen tothat the value of 𝑠 is 200 and the size of block is 4K- be 80 bits, which means ∣𝑝∣ = 160.Bytes, which is the same as the default size of clusterwhen the ﬁle size is less than 16TB in NTFS. In this 4. Hadoop can enable applications to work with thousands of nodes and petabytes of data, and it has been adopted by currentlycase, the value of 𝑠 ensures that the extra storage mainstream cloud platforms from Apache, Google, Yahoo, Amazon,doesn’t exceed 1% in storage servers. IBM and Sun.
13.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 13 Computation and communication costs. (s) 180 100 Computation and Communcation costs. (s) 150 ratio=50% ratio=40% ratio=30% 10 120 ratio=20% Commitment ratio=10% Challenge1 Challenge2(CSP2) Challenge2(CSP3) 90 Response1(CSP2) 1 Response1(CSP3) Response1(CSP1) Response2 Verification 60 Total Time 0.1 30 0 0.01 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 (s=20) (s=50) (s=100) (s=250) The ratio of queried blocks for total file blocks. (%) The size of files. (K-Bytes) (three CSP, r=(50%,30%,20%), 10M-Bytes, 250 sectors/blocks)Fig. 6. Experimental results under different ﬁle size, sampling ratio, and sector number. Firstly, we quantify the performance of our audit Based on homomorphic veriﬁable response and hashscheme under different parameters, such as ﬁle size index hierarchy, we have proposed a cooperative PDP 𝑠𝑧, sampling ratio 𝑤, sector number per block 𝑠, scheme to support dynamic scalability on multipleand so on. Our analysis shows that the value of 𝑠 storage servers. We also showed that our schemeshould grow with the increase of 𝑠𝑧 in order to reduce provided all security properties required by zero-computation and communication costs. Thus, our ex- knowledge interactive proof system, so that it canperiments were carried out as follows: the stored ﬁles resist various attacks even if it is deployed as a publicwere chosen from 10KB to 10MB; the sector numbers audit service in clouds. Furthermore, we optimizedwere changed from 20 to 250 in terms of ﬁle sizes; and the probabilistic query and periodic veriﬁcation to im-the sampling ratios were changed from 10% to 50%. prove the audit performance. Our experiments clearlyThe experimental results are shown in the left side of demonstrated that our approaches only introduce aFigure 6. These results dictate that the computation small amount of computation and communication http://ieeexploreprojects.blogspot.comand communication costs (including I/O costs) grow overheads. Therefore, our solution can be treated aswith the increase of ﬁle size and sampling ratio. a new candidate for data integrity veriﬁcation in Next, we compare the performance of each activity outsourcing data storage systems.in our veriﬁcation protocol. We have shown the the- As part of future work, we would extend ouroretical results in Table 4: the overheads of “commit- work to explore more effective CPDP constructions.ment” and “challenge” resemble one another, and the First, from our experiments we found that the per-overheads of “response” and “veriﬁcation” resemble formance of CPDP scheme, especially for large ﬁles,one another as well. To validate the theoretical results, is affected by the bilinear mapping operations duewe changed the sampling ratio 𝑤 from 10% to 50% for to its high complexity. To solve this problem, RSA-a 10MB ﬁle and 250 sectors per block in a multi-cloud based constructions may be a better choice, but this 𝒫 = {𝑃1 , 𝑃2 , 𝑃3 }, in which the proportions of data is still a challenging task because the existing RSA-blocks are 50%, 30%, and 20% in three CSPs, respec- based schemes have too many restrictions on thetively. In the right side of Figure 6, our experimental performance and security [2]. Next, from a practicalresults show that the computation and communi- point of view, we still need to address some issuescation costs of “commitment” and “challenge” are about integrating our CPDP scheme smoothly withslightly changed along with the sampling ratio, but existing systems, for example, how to match index-those for “response” and “veriﬁcation” grow with the hash hierarchy with HDFS’s two-layer name space,increase of the sampling ratio. Here, “challenge” and how to match index structure with cluster-network“response” can be divided into two sub-processes: model, and how to dynamically update the CPDP“challenge1” and “challenge2”, as well as “response1” parameters according to HDFS’ speciﬁc requirements.and “response2”, respectively. Furthermore, the pro- Finally, it is still a challenging problem for the gener-portions of data blocks in each CSP have greater ation of tags with the length irrelevant to the size ofinﬂuence on the computation costs of “challenge” and data blocks. We would explore such a issue to provide“response” processes. In summary, our scheme has the support of variable-length block veriﬁcation.better performance than non-cooperative approach. ACKNOWLEDGMENTS6 C ONCLUSIONS The work of Y. Zhu and M. Yu was supported by theIn this paper, we presented the construction of an National Natural Science Foundation of China (Projectefﬁcient PDP scheme for distributed cloud storage. No.61170264 and No.10990011). This work of Gail-J.
14.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 14Ahn and Hongxin Hu was partially supported by the [18] H. Hu, L. Hu, and D. Feng, “On a class of pseudorandomgrants from US National Science Foundation (NSF- sequences from elliptic curves over ﬁnite ﬁelds,” IEEE Trans- actions on Information Theory, vol. 53, no. 7, pp. 2598–2605, 2007.IIS-0900970 and NSF-CNS-0831360) and Department [19] A. Bialecki, M. Cafarella, D. Cutting, and O. O’Malley,of Energy (DE-SC0004308). “Hadoop: A framework for running applications on large clusters built of commodity hardware,” Tech. Rep., 2005. [Online]. Available: http://lucene.apache.org/hadoop/ [20] E. Al-Shaer, S. Jha, and A. D. Keromytis, Eds., Proceedings of theR EFERENCES 2009 ACM Conference on Computer and Communications Security, CCS 2009, Chicago, Illinois, USA, November 9-13, 2009. ACM,[1] B. Sotomayor, R. S. Montero, I. M. Llorente, and I. T. Foster, 2009. “Virtual infrastructure management in private and hybrid clouds,” IEEE Internet Computing, vol. 13, no. 5, pp. 14–22, 2009.[2] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Yan Zhu received the Ph.D. degree in com- Z. N. J. Peterson, and D. X. Song, “Provable data possession puter science from Harbin Engineering Uni- at untrusted stores,” in ACM Conference on Computer and versity, China, in 2005. He was an associate Communications Security, P. Ning, S. D. C. di Vimercati, and professor of computer science in the Insti- P. F. Syverson, Eds. ACM, 2007, pp. 598–609. tute of Computer Science and Technology[3] A. Juels and B. S. K. Jr., “Pors: proofs of retrievability for at Peking University since 2007. He worked large ﬁles,” in ACM Conference on Computer and Communications at the Department of Computer Science and Security, P. Ning, S. D. C. di Vimercati, and P. F. Syverson, Eds. Engineering, Arizona State University as a ACM, 2007, pp. 584–597. visiting associate professor from 2008 to[4] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scal- 2009. His research interests include cryptog- able and efﬁcient provable data possession,” in Proceedings raphy and network security. of the 4th international conference on Security and privacy in communication netowrks, SecureComm, 2008, pp. 1–10.[5] C. C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, ¨ ¸¨ Hongxin Hu is currently working toward the “Dynamic provable data possession,” in ACM Conference on Ph.D. degree from the School of Computing, Computer and Communications Security, E. Al-Shaer, S. Jha, and Informatics, and Decision Systems Engineer- A. D. Keromytis, Eds. ACM, 2009, pp. 213–222. ing, Ira A. Fulton School of Engineering, Ari-[6] H. Shacham and B. Waters, “Compact proofs of retrievabil- zona State University. He is also a member ity,” in ASIACRYPT, ser. Lecture Notes in Computer Science, of the Security Engineering for Future Com- J. Pieprzyk, Ed., vol. 5350. Springer, 2008, pp. 90–107. puting Laboratory, Arizona State University.[7] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, “Enabling public His current research interests include access veriﬁability and data dynamics for storage security in cloud control models and mechanisms, security computing,” in ESORICS, ser. Lecture Notes in Computer and privacy in social networks, and security http://ieeexploreprojects.blogspot.com Science, M. Backes and P. Ning, Eds., vol. 5789. Springer, in distributed and cloud computing, network 2009, pp. 355–370. and system security and secure software engineering.[8] Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. S. Yau, “Dy- namic audit services for integrity veriﬁcation of outsourced storages in clouds,” in SAC, W. C. Chu, W. E. Wong, M. J. Gail-Joon Ahn is an Associate Professor in Palakal, and C.-C. Hung, Eds. ACM, 2011, pp. 1550–1557. the School of Computing, Informatics, and[9] K. D. Bowers, A. Juels, and A. Oprea, “Hail: a high-availability Decision Systems Engineering, Ira A. Ful- and integrity layer for cloud storage,” in ACM Conference on ton Schools of Engineering and the Director Computer and Communications Security, E. Al-Shaer, S. Jha, and of Security Engineering for Future Comput- A. D. Keromytis, Eds. ACM, 2009, pp. 187–198. ing Laboratory, Arizona State University. His[10] Y. Dodis, S. P. Vadhan, and D. Wichs, “Proofs of retrievability research interests include information and via hardness ampliﬁcation,” in TCC, ser. Lecture Notes in systems security, vulnerability and risk man- Computer Science, O. Reingold, Ed., vol. 5444. Springer, 2009, agement, access control, and security ar- pp. 109–127. chitecture for distributed systems, which has[11] L. Fortnow, J. Rompel, and M. Sipser, “On the power of multi- been supported by the U.S. National Science prover interactive protocols,” in Theoretical Computer Science, Foundation, National Security Agency, U.S. Department of Defense, 1988, pp. 156–161. U.S. Department of Energy, Bank of America, Hewlett Packard,[12] Y. Zhu, H. Hu, G.-J. Ahn, Y. Han, and S. Chen, “Collaborative Microsoft, and Robert Wood Johnson Foundation. Dr. Ahn is a integrity veriﬁcation in hybrid clouds,” in IEEE Conference on recipient of the U.S. Department of Energy CAREER Award and the the 7th International Conference on Collaborative Computing: Net- Educator of the Year Award from the Federal Information Systems working, Applications and Worksharing, CollaborateCom, Orlando, Security Educators Association. He was an Associate Professor at Florida, USA, October 15-18, 2011, pp. 197–206. the College of Computing and Informatics, and the Founding Director[13] M. Armbrust, A. Fox, R. Grifﬁth, A. D. Joseph, R. H. Katz, of the Center for Digital Identity and Cyber Defense Research and A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and Laboratory of Information Integration, Security, and Privacy, Univer- M. Zaharia, “Above the clouds: A berkeley view of cloud com- sity of North Carolina, Charlotte. He received the Ph.D. degree in puting,” EECS Department, University of California, Berkeley, information technology from George Mason University, Fairfax, VA, Tech. Rep., Feb 2009. in 2000.[14] D. Boneh and M. Franklin, “Identity-based encryption from the weil pairing,” in Advances in Cryptology (CRYPTO’2001), vol. 2139 of LNCS, 2001, pp. 213–229.[15] O. Goldreich, Foundations of Cryptography: Basic Tools. Cam- Mengyang Yu received his B.S. degree from bridge University Press, 2001. the School of Mathematics Science, Peking[16] P. S. L. M. Barreto, S. D. Galbraith, C. O’Eigeartaigh, and University in 2010. He is currently a M.S. M. Scott, “Efﬁcient pairing computation on supersingular candidate in Peking University. His research abelian varieties,” Des. Codes Cryptography, vol. 42, no. 3, pp. interests include cryptography and computer 239–271, 2007. security.[17] J.-L. Beuchat, N. Brisebarre, J. Detrey, and E. Okamoto, “Arith- metic operators for pairing-based cryptography,” in CHES, ser. Lecture Notes in Computer Science, P. Paillier and I. Ver- bauwhede, Eds., vol. 4727. Springer, 2007, pp. 239–255.
Be the first to comment