Your SlideShare is downloading. ×
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cooperative provable data possession for integrity integrity verification in multi cloud storage.bak

3,070

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,070
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
115
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 Cooperative Provable Data Possession for Integrity Verification in Multi-Cloud Storage Yan Zhu, Hongxin Hu, Gail-Joon Ahn, Senior Member, IEEE, Mengyang Yu Abstract—Provable data possession (PDP) is a technique for ensuring the integrity of data in storage outsourcing. In this paper, we address the construction of an efficient PDP scheme for distributed cloud storage to support the scalability of service and data migration, in which we consider the existence of multiple cloud service providers to cooperatively store and maintain the clients’ data. We present a cooperative PDP (CPDP) scheme based on homomorphic verifiable response and hash index hierarchy. We prove the security of our scheme based on multi-prover zero-knowledge proof system, which can satisfy completeness, knowledge soundness, and zero-knowledge properties. In addition, we articulate performance optimization mechanisms for our scheme, and in particular present an efficient method for selecting optimal parameter values to minimize the computation costs of clients and storage service providers. Our experiments show that our solution introduces lower computation and communication overheads in comparison with non-cooperative approaches. Index Terms—Storage Security, Provable Data Possession, Interactive Protocol, Zero-knowledge, Multiple Cloud, Cooperative ✦ 1 I NTRODUCTION I N recent years, cloud storage service has become a uncertain storage pool outside the enterprise. There- faster profit growth point by providing a compara- fore, it is indispensable for cloud service providers bly low-cost, scalable, position-independent platform (CSPs) to provide security techniques for managing for clients’ data. Since cloud computing environment their storage services. is constructed based on open architectures and inter- Provable data possession (PDP) [2] (or proofs of faces, it has the capability to incorporate multiple in- retrievability (POR) [3]) is such a probabilistic proof http://ieeexploreprojects.blogspot.com ternal and/or external cloud services together to pro- technique for a storage provider to prove the integrity vide high interoperability. We call such a distributed and ownership of clients’ data without download- cloud environment as a multi-Cloud (or hybrid cloud). ing data. The proof-checking without downloading Often, by using virtual infrastructure management makes it especially important for large-size files and (VIM) [1], a multi-cloud allows clients to easily access folders (typically including many clients’ files) to his/her resources remotely through interfaces such as check whether these data have been tampered with Web services provided by Amazon EC2. or deleted without downloading the latest version of There exist various tools and technologies for multi- data. Thus, it is able to replace traditional hash and cloud, such as Platform VM Orchestrator, VMware signature functions in storage outsourcing. Various vSphere, and Ovirt. These tools help cloud providers PDP schemes have been recently proposed, such as construct a distributed cloud storage platform (DCSP) Scalable PDP [4] and Dynamic PDP [5]. However, for managing clients’ data. However, if such an im- these schemes mainly focus on PDP issues at un- portant platform is vulnerable to security attacks, it trusted servers in a single cloud storage provider and would bring irretrievable losses to the clients. For are not suitable for a multi-cloud environment (see example, the confidential data in an enterprise may be the comparison of POR/PDP schemes in Table 1). illegally accessed through a remote interface provided by a multi-cloud, or relevant data and archives may Motivation. To provide a low-cost, scalable, location- be lost or tampered with when they are stored into an independent platform for managing clients’ data, cur- rent cloud storage systems adopt several new dis- tributed file systems, for example, Apache Hadoop ∙ A preliminary version of this paper appeared under the title ”Efficient Provable Data Possession for Hybrid Clouds” in Proc. of the 17th Distribution File System (HDFS), Google File System ACM Conference on Computer and Communications Security (CCS), (GFS), Amazon S3 File System, CloudStore etc. These Chicago, IL, USA, 2010, pp. 881-883. file systems share some similar features: a single meta- ∙ Y. Zhu is with the Institute of Computer Science and Technology, Peking University, Beijing 100871, China, and the Beijing Key Lab- data server provides centralized management by a oratory of Internet Security Technology, Peking University, Beijing global namespace; files are split into blocks or chunks 100871, China. E-mail: {yan.zhu,huzexing}@pku.edu.cn. and stored on block servers; and the systems are ∙ H. Hu and G.-J. Ahn are with the Arizona State University, Tempe, Arizona, 85287. E-mail: {hxhu,gahn}@asu.edu. comprised of interconnected clusters of block servers. ∙ M. Yang is with the School of Mathematics Science, Peking University, Those features enable cloud service providers to store Beijing 100871, China. E-mail: myyu@pku.edu.cn. and process large amounts of data. However, it is crucial to offer an efficient verification on the integrityDigital Object Indentifier 10.1109/TPDS.2012.66 1045-9219/12/$31.00 © 2012 IEEE
  • 2. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2 TABLE 1 Comparison of POR/PDP schemes for a file consisting of 𝑛 blocks. CSP Client Multiple Prob. of Scheme Type Comm. Frag. Privacy Comp. Comp. Clouds Detection PDP[2] 𝐻𝑜𝑚𝑇 𝑂(𝑡) 𝑂(𝑡) 𝑂(1) ✓ ♯ 1 − (1 − 𝜌) 𝑡 SPDP[4] 𝑀 𝐻𝑇 𝑂(𝑡) 𝑂(𝑡) 𝑂(𝑡) ✓ ✓ 1 − (1 − 𝜌) 𝑡⋅𝑠 DPDP-I[5] 𝑀 𝐻𝑇 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) ✓ 1 − (1 − 𝜌) 𝑡 DPDP-II[5] 𝑀 𝐻𝑇 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) 𝑂(𝑡 log 𝑛) 1 − (1 − 𝜌)Ω(𝑛) CPOR-I[6] 𝐻𝑜𝑚𝑇 𝑂(𝑡) 𝑂(𝑡) 𝑂(1) ♯ 1 − (1 − 𝜌) 𝑡 𝑡⋅𝑠 CPOR-II[6] 𝐻𝑜𝑚𝑇 𝑂(𝑡 + 𝑠) 𝑂(𝑡 + 𝑠) 𝑂(𝑠) ✓ ♯ ∏ − (1 − 𝜌) 1 Our Scheme 𝐻𝑜𝑚𝑅 𝑂(𝑡 + 𝑐 ⋅ 𝑠) 𝑂(𝑡 + 𝑠) 𝑂(𝑠) ✓ ✓ ✓ 1 − 𝑃 𝑘 ∈𝒫 (1 − 𝜌 𝑘 ) 𝑟 𝑘 ⋅𝑡⋅𝑠 𝑠 is the number of sectors in each block, 𝑐 is the number of CSPs in a multi-cloud, 𝑡 is the number of sampling blocks, 𝜌 and 𝜌 𝑘 are the probability of block corruption in a cloud server and 𝑘-th cloud server in a multi-cloud 𝒫 = {𝑃 𝑘 },respective, ♯ denotes the verification process in a trivial approach, and 𝑀 𝐻𝑇, 𝐻𝑜𝑚𝑇, 𝐻𝑜𝑚𝑅 denotes Merkle Hash tree,homomorphic tags, and homomorphic responses, respectively.and availability of stored data for detecting faults cooperative PDP scheme should provide features forand automatic recovery. Moreover, this verification timely detecting abnormality and renewing multipleis necessary to provide reliability by automatically copies of data.maintaining multiple copies of data and automatically Even though existing PDP schemes have addressedredeploying processing logic in the event of failures. various security properties, such as public verifia- Although existing schemes can make a false or true bility [2], dynamics [5], scalability [4], and privacydecision for data possession without downloading preservation [7], we still need a careful considerationdata at untrusted stores, they are not suitable for of some potential attacks, including two major cat-a distributed cloud storage environment since they egories: Data Leakage Attack by which an adversarywere not originally constructed on interactive proof can easily obtain the stored data through verifica-system. For example, the schemes based on Merkle tion process after running or wiretapping sufficientHash tree (MHT), such as DPDP-I, DPDP-II [2] and verification communications (see Attacks 1 and 3 inSPDP [4] in Table 1, use an authenticated skip list to http://ieeexploreprojects.blogspot.com Forgery Attack by which acheck the integrity of file blocks adjacently in space. Appendix A), and Tag dishonest CSP can deceive the clients (see Attacks 2Unfortunately, they did not provide any algorithms and 4 in Appendix A). These two attacks may causefor constructing distributed Merkle trees that are potential risks for privacy leakage and ownershipnecessary for efficient verification in a multi-cloud cheating. Also, these attacks can more easily compro-environment. In addition, when a client asks for a file mise the security of a distributed cloud system thanblock, the server needs to send the file block along that of a single cloud system.with a proof for the intactness of the block. However,this process incurs significant communication over- Although various security models have been pro-head in a multi-cloud environment, since the server posed for existing PDP schemes [2], [7], [6], thesein one cloud typically needs to generate such a proof models still cannot cover all security requirements,with the help of other cloud storage services, where especially for provable secure privacy preservationthe adjacent blocks are stored. The other schemes, and ownership authentication. To establish a highlysuch as PDP [2], CPOR-I, and CPOR-II [6] in Table effective security model, it is necessary to analyze the1, are constructed on homomorphic verification tags, PDP scheme within the framework of zero-knowledgeby which the server can generate tags for multiple file proof system (ZKPS) due to the reason that PDPblocks in terms of a single response value. However, system is essentially an interactive proof system (IPS),that doesn’t mean the responses from multiple clouds which has been well studied in the cryptography com-can be also combined into a single value on the munity. In summary, a verification scheme for dataclient side. For lack of homomorphic responses, clients integrity in distributed storage environments shouldmust invoke the PDP protocol repeatedly to check have the following features:the integrity of file blocks stored in multiple cloud ∙ Usability aspect: A client should utilize theservers. Also, clients need to know the exact position integrity check in the way of collaboration services.of each file block in a multi-cloud environment. In The scheme should conceal the details of the storageaddition, the verification process in such a case will to reduce the burden on clients;lead to high communication overheads and compu- ∙ Security aspect: The scheme should provide ad-tation costs at client sides as well. Therefore, it is of equate security features to resist some existing attacks,utmost necessary to design a cooperative PDP model such as data leakage attack and tag forgery attack;to reduce the storage and network overheads and ∙ Performance aspect: The scheme should haveenhance the transparency of verification activities in the lower communication and computation overheadscluster-based cloud storage systems. Moreover, such a than non-cooperative solution.
  • 3. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 3Related Works. To check the availability and integrity correcting code (IP-ECC), which improves the securityof outsourced data in cloud storages, researchers have and efficiency of existing tools, like POR. However,proposed two basic approaches called Provable Data a file must be transformed into 𝑙 distinct segmentsPossession (PDP) [2] and Proofs of Retrievability with the same length, which are distributed across 𝑙(POR) [3]. Ateniese et al. [2] first proposed the PDP servers. Hence, this system is more suitable for RAIDmodel for ensuring possession of files on untrusted rather than a cloud storage.storages and provided an RSA-based scheme for a Our Contributions. In this paper, we address thestatic case that achieves the 𝑂(1) communication problem of provable data possession in distributedcost. They also proposed a publicly verifiable version, cloud environments from the following aspects: highwhich allows anyone, not just the owner, to challenge security, transparent verification, and high performance.the server for data possession. This property greatly To achieve these goals, we first propose a verificationextended application areas of PDP protocol due to the framework for multi-cloud storage along with twoseparation of data owners and the users. However, fundamental techniques: hash index hierarchy (HIH)these schemes are insecure against replay attacks in and homomorphic verifiable response (HVR).dynamic scenarios because of the dependencies on We then demonstrate that the possibility of con-the index of blocks. Moreover, they do not fit for structing a cooperative PDP (CPDP) scheme withoutmulti-cloud storage due to the loss of homomorphism compromising data privacy based on modern crypto-property in the verification process. graphic techniques, such as interactive proof system In order to support dynamic data operations, Ate- (IPS). We further introduce an effective constructionniese et al. developed a dynamic PDP solution called of CPDP scheme using above-mentioned structure.Scalable PDP [4]. They proposed a lightweight PDP Moreover, we give a security analysis of our CPDPscheme based on cryptographic hash function and scheme from the IPS model. We prove that thissymmetric key encryption, but the servers can deceive construction is a multi-prover zero-knowledge proofthe owners by using previous metadata or responses system (MP-ZKPS) [11], which has completeness,due to the lack of randomness in the challenges. The knowledge soundness, and zero-knowledge proper-numbers of updates and challenges are limited and ties. These properties ensure that CPDP scheme canfixed in advance and users cannot perform block implement the security against data leakage attack andinsertions anywhere. Based on this work, Erway et tag forgery attack. http://ieeexploreprojects.blogspot.comal. [5] introduced two Dynamic PDP schemes with a To improve the system performance with respect tohash function tree to realize 𝑂(log 𝑛) communication our scheme, we analyze the performance of proba-and computational costs for a 𝑛-block file. The basic bilistic queries for detecting abnormal situations. Thisscheme, called DPDP-I, retains the drawback of Scal- probabilistic method also has an inherent benefit inable PDP, and in the ‘blockless’ scheme, called DPDP- reducing computation and communication overheads.II, the data blocks {𝑚 𝑖 𝑗 } 𝑗∈[1,𝑡] can be leaked by the re- Then, we present an efficient method for the selection ∑sponse of a challenge, 𝑀 = 𝑡𝑗=1 𝑎 𝑗 𝑚 𝑖 𝑗 , where 𝑎 𝑗 is a of optimal parameter values to minimize the compu-random challenge value. Furthermore, these schemes tation overheads of CSPs and the clients’ operations.are also not effective for a multi-cloud environment In addition, we analyze that our scheme is suitable forbecause the verification path of the challenge block existing distributed cloud storage systems. Finally, ourcannot be stored completely in a cloud [8]. experiments show that our solution introduces very Juels and Kaliski [3] presented a POR scheme, limited computation and communication overheads.which relies largely on preprocessing steps that the Organization. The rest of this paper is organized asclient conducts before sending a file to a CSP. Un- follows. In Section 2, we describe a formal definitionfortunately, these operations prevent any efficient ex- of CPDP and the underlying techniques, which aretension for updating data. Shacham and Waters [6] utilized in the construction of our scheme. We intro-proposed an improved version of this protocol called duce the details of cooperative PDP scheme for multi-Compact POR, which uses homomorphic property cloud storage in Section 3. We describes the securityto aggregate a proof into 𝑂(1) authenticator value and performance evaluation of our scheme in Sectionand 𝑂(𝑡) computation cost for 𝑡 challenge blocks, but 4 and 5, respectively. We discuss the related work intheir solution is also static and could not prevent Section and Section 6 concludes this paper.the leakage of data blocks in the verification process.Wang et al. [7] presented a dynamic scheme with 𝑂(log 𝑛) cost by integrating the Compact POR scheme 2 S TRUCTURE AND T ECHNIQUESand Merkle Hash Tree (MHT) into the DPDP. Further- In this section, we present our verification frameworkmore, several POR schemes and models have been for multi-cloud storage and a formal definition ofrecently proposed including [9], [10]. In [9] Bowers CPDP. We introduce two fundamental techniques foret al. introduced a distributed cryptographic system constructing our CPDP scheme: hash index hierarchythat allows a set of servers to solve the PDP problem. (HIH) on which the responses of the clients’ chal-This system is based on an integrity-protected error- lenges computed from multiple CSPs can be com-
  • 4. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 4bined into a single response as the final result; and ho- We neither assume that CSP is trust to guaranteemomorphic verifiable response (HVR) which supports the security of the stored data, nor assume that datadistributed cloud storage in a multi-cloud storage owner has the ability to collect the evidence of theand implements an efficient construction of collision- CSP’s fault after errors have been found. To achieveresistant hash function, which can be viewed as a this goal, a TTP server is constructed as a core trustrandom oracle model in the verification protocol. base on the cloud for the sake of security. We as- sume the TTP is reliable and independent through2.1 Verification Framework for Multi-Cloud the following functions [12]: to setup and maintainAlthough existing PDP schemes offer a publicly acces- the CPDP cryptosystem; to generate and store datasible remote interface for checking and managing the owner’s public key; and to store the public parameterstremendous amount of data, the majority of existing used to execute the verification protocol in the CPDPPDP schemes are incapable to satisfy the inherent scheme. Note that the TTP is not directly involved inrequirements from multiple clouds in terms of com- the CPDP scheme in order to reduce the complexitymunication and computation costs. To address this of cryptosystemproblem, we consider a multi-cloud storage service asillustrated in Figure 1. In this architecture, a data stor- 2.2 Definition of Cooperative PDPage service involves three different entities: Clients In order to prove the integrity of data stored inwho have a large amount of data to be stored in a multi-cloud environment, we define a frameworkmultiple clouds and have the permissions to access for CPDP based on interactive proof system (IPS)and manipulate stored data; Cloud Service Providers and multi-prover zero-knowledge proof system (MP-(CSPs) who work together to provide data storage ZKPS), as follows:services and have enough storages and computa-tion resources; and Trusted Third Party (TTP) who Definition 1 (Cooperative-PDP): A cooperative prov-is trusted to store verification parameters and offer able data possession 𝒮 = (𝐾𝑒𝑦𝐺𝑒𝑛, 𝑇 𝑎𝑔𝐺𝑒𝑛, 𝑃 𝑟𝑜𝑜𝑓 )public query services for these parameters. is a collection of two algorithms (𝐾𝑒𝑦𝐺𝑒𝑛, 𝑇 𝑎𝑔𝐺𝑒𝑛) and an interactive proof system 𝑃 𝑟𝑜𝑜𝑓 , as follows: 𝐾𝑒𝑦𝐺𝑒𝑛(1 𝜅): takes a security parameter 𝜅 as input, http://ieeexploreprojects.blogspot.com key 𝑠𝑘 or a public-secret key- and returns a secret pair (𝑝𝑘, 𝑠𝑘); 𝑇 𝑎𝑔𝐺𝑒𝑛(𝑠𝑘, 𝐹, 𝒫): takes as inputs a secret key 𝑠𝑘, a file 𝐹 , and a set of cloud storage providers 𝒫 = {𝑃 𝑘 }, and returns the triples (𝜁, 𝜓, 𝜎), where 𝜁 is the secret in tags, 𝜓 = (𝑢, ℋ) is a set of verification parameters 𝑢 and an index hierarchy ℋ for 𝐹 , 𝜎 = {𝜎 (𝑘) } 𝑃 𝑘 ∈𝒫 denotes a set of all tags, 𝜎 (𝑘) is the tag of the fraction 𝐹 (𝑘) of 𝐹 in 𝑃 𝑘 ; 𝑃 𝑟𝑜𝑜𝑓 (𝒫, 𝑉 ): is a protocol of proof of data possession between CSPs (𝒫 = {𝑃 𝑘 }) and a verifier (V), that is, 〈 〉 ∑ (𝑘) (𝑘) 𝑃 𝑘 (𝐹 , 𝜎 ) ←→ 𝑉 (𝑝𝑘, 𝜓) 𝑃 𝑘 ∈𝒫 {Fig. 1. Verification architecture for data integrity. 1 𝐹 = {𝐹 (𝑘) } is intact = , 0 𝐹 = {𝐹 (𝑘) } is changed In this architecture, we consider the existence of where each 𝑃 𝑘 takes as input a file 𝐹 (𝑘) and a setmultiple CSPs to cooperatively store and maintain the of tags 𝜎 (𝑘) , and a public key 𝑝𝑘 and a set of publicclients’ data. Moreover, a cooperative PDP is used to parameters 𝜓 are the common input between 𝑃verify the integrity and availability of their stored data and 𝑉 . At the end of the protocol run, 𝑉 ∑ returnsin all CSPs. The verification procedure is described as a bit {0∣1} denoting false and true. Where, 𝑃 𝑘 ∈𝒫follows: Firstly, a client (data owner) uses the secret denotes cooperative computing in 𝑃 𝑘 ∈ 𝒫.key to pre-process a file which consists of a collectionof 𝑛 blocks, generates a set of public verification A trivial way to realize the CPDP is to check theinformation that is stored in TTP, transmits the file data stored in each cloud one by one, i.e.,and some verification tags to CSPs, and may delete ⋀its local copy; Then, by using a verification protocol, ⟨𝑃 𝑘 (𝐹 (𝑘) , 𝜎 (𝑘) ) ←→ 𝑉 ⟩(𝑝𝑘, 𝜓), 𝑃 𝑘 ∈𝒫the clients can issue a challenge for one CSP to check ⋀the integrity and availability of outsourced data with where denotes the logical AND operations amongrespect to public information stored in TTP. the boolean outputs of all protocols ⟨𝑃 𝑘 , 𝑉 ⟩ for all
  • 5. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 5 𝑃 𝑘 ∈ 𝒫. However, it would cause significant commu- We make use of this simple hierarchy to organizenication and computation overheads for the verifier, data blocks from multiple CSP services into a large-as well as a loss of location-transparent. Such a prim- size file by shading their differences among theseitive approach obviously diminishes the advantages cloud storage systems. For example, in Figure 2 theof cloud storage: scaling arbitrarily up and down on- resources in Express Layer are split and stored intodemand [13]. To solve this problem, we extend above three CSPs, that are indicated by different colors, indefinition by adding an organizer(𝑂), which is one Service Layer. In turn, each CSP fragments and storesof CSPs that directly contacts with the verifier, as the assigned data into the storage servers in Storagefollows: Layer. We also make use of colors to distinguish 〈 〉 different CSPs. Moreover, we follow the logical order ∑ (𝑘) (𝑘) of the data blocks to organize the Storage Layer. 𝑃 𝑘 (𝐹 , 𝜎 ) ←→ 𝑂 ←→ 𝑉 (𝑝𝑘, 𝜓), 𝑃 𝑘 ∈𝒫 This architecture also provides special functions for data storage and management, e.g., there may existwhere the action of organizer is to initiate and orga- overlaps among data blocks (as shown in dashednize the verification process. This definition is con- boxes) and discontinuous blocks but these functionssistent with aforementioned architecture, e.g., a client may increase the complexity of storage management.(or an authorized application) is considered as 𝑉 , theCSPs are as 𝒫 = {𝑃 𝑖 } 𝑖∈[1,𝑐] , and the Zoho cloud is Storage Layer Service Layer Express Layeras the organizer in Figure 1. Often, the organizer isan independent server or a certain CSP in 𝒫. Theadvantage of this new multi-prover proof system is [1(2) H[ ("Cn") (1)that it does not make any difference for the clientsbetween multi-prover verification process and single-prover verification process in the way of collaboration. [i(3) ,1 H [ ( 2) ( Fi ) 1Also, this kind of transparent verification is able toconceal the details of data storage to reduce the [ (1) Hs ¦i 1Wi ("Fn")burden on clients. For the sake of clarity, we list some [ (2) 2 H [ (1) (" Cn ")used signals in Table 2. http://ieeexploreprojects.blogspot.com TABLE 2 CSP1 The signal and its explanation. [3(2) H [ (1) (" Cn ") CSP2 Sig. Repression CSP3 𝑛 the number of blocks in a file; Overlap 𝑠 the number of sectors in each block; 𝑡 the number of index coefficient pairs in a query; 𝑐 the number of clouds to store a file; Fig. 2. Index-hash hierarchy of CPDP model. 𝑖∈[1,𝑛] 𝐹 the file with 𝑛 × 𝑠 sectors, i.e., 𝐹 = {𝑚 𝑖,𝑗 } 𝑗∈[1,𝑠] ; 𝜎 the set of tags, i.e., 𝜎 = {𝜎 𝑖 } 𝑖∈[1,𝑛] ; 𝑄 the set of index-coefficient pairs, i.e., 𝑄 = {(𝑖, 𝑣 𝑖 )}; In storage layer, we define a common fragment 𝜃 the response for the challenge 𝑄. structure that provides probabilistic verification of data integrity for outsourced storage. The fragment structure is a data structure that maintains a set of2.3 Hash Index Hierarchy for CPDP block-tag pairs, allowing searches, checks and updates in 𝑂(1) time. An instance of this structure is shown inTo support distributed cloud storage, we illustrate storage layer of Figure 2: an outsourced file 𝐹 is splita representative architecture used in our cooperative into 𝑛 blocks {𝑚1 , 𝑚2 , ⋅ ⋅ ⋅ , 𝑚 𝑛 }, and each block 𝑚 𝑖 isPDP scheme as shown in Figure 2. Our architecture split into 𝑠 sectors {𝑚 𝑖,1 , 𝑚 𝑖,2 , ⋅ ⋅ ⋅ , 𝑚 𝑖,𝑠 }. The fragmenthas a hierarchy structure which resembles a natural structure consists of 𝑛 block-tag pair (𝑚 𝑖 , 𝜎 𝑖 ), whererepresentation of file storage. This hierarchical struc- 𝜎 𝑖 is a signature tag of block 𝑚 𝑖 generated by ature ℋ consists of three layers to represent relation- set of secrets 𝜏 = (𝜏1 , 𝜏2 , ⋅ ⋅ ⋅ , 𝜏 𝑠 ). In order to checkships among all blocks for stored resources. They are the data integrity, the fragment structure implementsdescribed as follows: probabilistic verification as follows: given a random 1) Express Layer: offers an abstract representation chosen challenge (or query) 𝑄 = {(𝑖, 𝑣 𝑖 )} 𝑖∈ 𝑅 𝐼 , where of the stored resources; 𝐼 is a subset of the block indices and 𝑣 𝑖 is a ran- 2) Service Layer: offers and manages cloud storage dom coefficient. There exists an efficient algorithm to services; and produce a constant-size response (𝜇1 , 𝜇2 , ⋅ ⋅ ⋅ , 𝜇 𝑠 , 𝜎 ′ ), 3) Storage Layer: realizes data storage on many where 𝜇 𝑖 comes from all {𝑚 𝑘,𝑖 , 𝑣 𝑘 } 𝑘∈𝐼 and 𝜎 ′ is from physical devices. all {𝜎 𝑘 , 𝑣 𝑘 } 𝑘∈𝐼 .
  • 6. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 6 Given a collision-resistant hash function 𝐻 𝑘 (⋅), we a challenge-response protocol, we extend this notationmake use of this architecture to construct a Hash to the concept of Homomorphic Verifiable ResponsesIndex Hierarchy ℋ (viewed as a random oracle), (HVR), which is used to integrate multiple responseswhich is used to replace the common hash function from the different CSPs in CPDP scheme as follows:in prior PDP schemes, as follows: 𝑠 Definition 2 (Homomorphic Verifiable Response): A re- 1) Express layer: given 𝑠 random {𝜏 𝑖 } 𝑖=1 and the sponse is called homomorphic verifiable response in a file name 𝐹 𝑛 , sets 𝜉 (1) = 𝐻∑ 𝑖=1 𝜏 𝑖 (𝐹 𝑛 ) and makes PDP protocol, if given two responses 𝜃 and 𝜃 for two 𝑠 𝑠 𝑖 𝑗 it public for verification but makes {𝜏 𝑖 } 𝑖=1 secret; challenges 𝑄 and 𝑄 from two CSPs, there exists an 𝑖 𝑗 2) Service layer: given the 𝜉 (1) and the cloud name efficient algorithm to combine them into a response 𝜃 (2) 𝐶 𝑘 , sets 𝜉 𝑘 = 𝐻 𝜉(1) (𝐶 𝑘 ); ∪ corresponding to the sum of the challenges 𝑄 𝑖 𝑄 𝑗. 3) Storage layer: given the 𝜉 (2) , a block number 𝑖, (3) Homomorphic verifiable response is the key tech- and its index record 𝜒 𝑖 = “𝐵 𝑖 ∣∣𝑉 𝑖 ∣∣𝑅 𝑖 ”, sets 𝜉 𝑖,𝑘 = nique of CPDP because it not only reduces the com- 𝐻 𝜉(2) (𝜒 𝑖 ), where 𝐵 𝑖 is the sequence number of a 𝑘 munication bandwidth, but also conceals the location block, 𝑉 𝑖 is the updated version number, and 𝑅 𝑖 of outsourced data in the distributed cloud storage is a random integer to avoid collision. environment. As a virtualization approach, we introduce a simpleindex-hash table 𝜒 = {𝜒 𝑖 } to record the changes offile blocks as well as to generate the hash value of 3 C OOPERATIVE PDP S CHEMEeach block in the verification process. The structure In this section, we propose a CPDP scheme for multi-of 𝜒 is similar to the structure of file block allocation cloud system based on the above-mentioned struc-table in file systems. The index-hash table consists of ture and techniques. This scheme is constructed onserial number, block number, version number, random collision-resistant hash, bilinear map group, aggrega-integer, and so on. Different from the common index tion algorithm, and homomorphic responses.table, we assure that all records in our index tablediffer from one another to prevent forgery of data 3.1 Notations and Preliminariesblocks and tags. By using this structure, especially Let ℍ = {𝐻 } be a family of hash functions 𝐻 : 𝑘 𝑘the index records {𝜒 𝑖 }, our CPDP scheme can also {0, 1} 𝑛 → {0, 1}∗ index by 𝑘 ∈ 𝒦. We say thatsupport dynamic data operations [8]. http://ieeexploreprojects.blogspot.com algorithm 𝒜 has advantage 𝜖 in breaking collision- The proposed structure can be readily incorperated resistance of ℍ if Pr[𝒜(𝑘) = (𝑚 , 𝑚 ) : 𝑚 ∕= 0 1 0into MAC-based, ECC or RSA schemes [2], [6]. These 𝑚1 , 𝐻 𝑘 (𝑚0 ) = 𝐻 𝑘 (𝑚1 )] ≥ 𝜖, where the probability isschemes, built from collision-resistance signatures (see over the random choices of 𝑘 ∈ 𝒦 and the randomSection 3.1) and the random oracle model, have the bits of 𝒜. So that, we have the following definition.shortest query and response with public verifiability.They share several common characters for the imple- Definition 3 (Collision-Resistant Hash): A hash fam-mentation of the CPDP framework in the multiple ily ℍ is (𝑡, 𝜖)-collision-resistant if no 𝑡-time adver-clouds: 1) a file is split into 𝑛×𝑠 sectors and each block sary has advantage at least 𝜖 in breaking collision-(𝑠 sectors) corresponds to a tag, so that the storage of resistance of ℍ.signature tags can be reduced by the increase of 𝑠; We set up our system using bilinear pairings pro-2) a verifier can verify the integrity of file in random posed by Boneh and Franklin [14]. Let 𝔾 and 𝔾 be 𝑇sampling approach, which is of utmost importance two multiplicative groups using elliptic curve conven-for large files; 3) these schemes rely on homomorphic tions with a large prime order 𝑝. The function 𝑒 is aproperties to aggregate data and tags into a constant- computable bilinear map 𝑒 : 𝔾 × 𝔾 → 𝔾 with the fol- 𝑇size response, which minimizes the overhead of net- lowing properties: for any 𝐺, 𝐻 ∈ 𝔾 and all 𝑎, 𝑏 ∈ ℤ , 𝑝work communication; and 4) the hierarchy structure we have 1) Bilinearity: 𝑒([𝑎]𝐺, [𝑏]𝐻) = 𝑒(𝐺, 𝐻) 𝑎𝑏 ; 2)provides a virtualization approach to conceal the stor- Non-degeneracy: 𝑒(𝐺, 𝐻) ∕= 1 unless 𝐺 or 𝐻 = 1; andage details of multiple CSPs. 3) Computability: 𝑒(𝐺, 𝐻) is efficiently computable.2.4 Homomorphic Verifiable Response for CPDP Definition 4 (Bilinear Map Group System): A bilinear map group system is a tuple 𝕊 = ⟨𝑝, 𝔾, 𝔾 𝑇 , 𝑒⟩ com-A homomorphism is a map 𝑓 : ℙ → ℚ between two posed of the objects as described above.groups such that 𝑓 (𝑔1 ⊕ 𝑔2 ) = 𝑓 (𝑔1 ) ⊗ 𝑓 (𝑔2 ) for all 𝑔1 , 𝑔2 ∈ ℙ, where ⊕ denotes the operation in ℙ and⊗ denotes the operation in ℚ. This notation has been 3.2 Our CPDP Schemeused to define Homomorphic Verifiable Tags (HVTs) In our scheme (see Fig 3), the manager first runs algo-in [2]: Given two values 𝜎 𝑖 and 𝜎 𝑗 for two messages rithm 𝐾𝑒𝑦𝐺𝑒𝑛 to obtain the public/private key pairs 𝑚 𝑖 and 𝑚 𝑗 , anyone can combine them into a value for CSPs and users. Then, the clients generate the tags 𝜎 ′ corresponding to the sum of the messages 𝑚 𝑖 + of outsourced data by using 𝑇 𝑎𝑔𝐺𝑒𝑛. Anytime, the 𝑚 𝑗 . When provable data possession is considered as protocol 𝑃 𝑟𝑜𝑜𝑓 is performed by a 5-move interactive
  • 7. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 7 KeyGen(1 𝜅 ): Let 𝕊 = (𝑝, 𝔾, 𝔾 𝑇 , 𝑒) be a bilinear map group system with randomly selected generators 𝑔, ℎ ∈ 𝔾, where 𝔾, 𝔾 𝑇 are two bilinear groups of a large prime order 𝑝, ∣𝑝∣ = 𝑂(𝜅). Makes a hash function 𝐻 𝑘 (⋅) public. For a CSP, chooses a random number 𝑠 ∈ 𝑅 ℤ 𝑝 and computes 𝑆 = 𝑔 𝑠 ∈ 𝔾. Thus, 𝑠𝑘 𝑝 = 𝑠 and 𝑝𝑘 𝑝 = (𝑔, 𝑆). For a user, chooses two random numbers 𝛼, 𝛽 ∈ 𝑅 ℤ 𝑝 and sets 𝑠𝑘 𝑢 = (𝛼, 𝛽) and 𝑝𝑘 𝑢 = (𝑔, ℎ, 𝐻1 = ℎ 𝛼 , 𝐻2 = ℎ 𝛽 ). TagGen(𝑠𝑘, 𝐹, 𝒫): Splits 𝐹 into 𝑛 × 𝑠 sectors {𝑚 𝑖,𝑗 } 𝑖∈[1,𝑛],𝑗∈[1,𝑠] ∈ ℤ 𝑝𝑛×𝑠 . Chooses 𝑠 random 𝜏1 , ⋅ ⋅ ⋅ , 𝜏 𝑠 ∈ ℤ 𝑝 as the secret of this file and computes 𝑢 𝑖 = 𝑔 𝜏 𝑖 ∈ 𝔾 for 𝑖 ∈ [1, 𝑠]. Constructs the index table 𝜒 = {𝜒 𝑖 } 𝑖=1 and fills out the 𝑛 record 𝜒 𝑖 in 𝜒 for 𝑖 ∈ [1, 𝑛], then calculates the tag for each block 𝑚 𝑖 as a { (1) (2) 𝜉 ← 𝐻∑ 𝑖=1 𝜏 𝑖 (𝐹 𝑛 ), 𝑠 𝜉𝑘 ← 𝐻 𝜉(1) (𝐶 𝑘 ), (3) (3) ∏𝑠 𝑚 𝜉 𝑖,𝑘 ← 𝐻 𝜉(2) (𝜒 𝑖 ), 𝜎 𝑖,𝑘 ← (𝜉 𝑖,𝑘 ) 𝛼 ⋅ ( 𝑗=1 𝑢 𝑗 𝑖,𝑗 ) 𝛽 , 𝑘 where 𝐹 𝑛 is the file name and 𝐶 𝑘 is the CSP name of 𝑃 𝑘 ∈ 𝒫. And then stores 𝜓 = (𝑢, 𝜉 (1) , 𝜒) into TTP, and 𝜎 𝑘 = {𝜎 𝑖,𝑗 }∀𝑗=𝑘 to 𝑃 𝑘 ∈ 𝒫, where 𝑢 = (𝑢1 , ⋅ ⋅ ⋅ , 𝑢 𝑠 ). Finally, the data owner saves the secret 𝜁 = (𝜏1 , ⋅ ⋅ ⋅ , 𝜏 𝑠 ). Proof(𝒫, 𝑉 ): This is a 5-move protocol among the Provers (𝒫 = {𝑃 𝑖 } 𝑖∈[1,𝑐] ), an organizer (𝑂), and a Verifier (𝑉 ) with the common input (𝑝𝑘, 𝜓), which is stored in TTP, as follows: 1) Commitment(𝑂 → 𝑉 ): the organizer chooses a random 𝛾 ∈ 𝑅 ℤ 𝑝 and sends 𝐻1 = 𝐻1𝛾 to the verifier; ′ 2) Challenge1(𝑂 ← 𝑉 ): the verifier chooses a set of challenge index-coefficient pairs 𝑄 = {(𝑖, 𝑣 𝑖 )} 𝑖∈𝐼 and sends 𝑄 to the organizer, where 𝐼 is a set of random indexes in [1, 𝑛] and 𝑣 𝑖 is a random integer in ℤ∗ ; 𝑝 3) Challenge2(𝒫 ← 𝑂): the organizer forwards 𝑄 𝑘 = {(𝑖, 𝑣 𝑖 )} 𝑚 𝑖 ∈𝑃 𝑘 ⊆ 𝑄 to each 𝑃 𝑘 in 𝒫; 4) Response1(𝒫 → 𝑂): 𝑃 𝑘 chooses a random 𝑟 𝑘 ∈ ℤ 𝑝 and 𝑠 random 𝜆 𝑗,𝑘 ∈ ℤ 𝑝 for 𝑗 ∈ [1, 𝑠], and calculates a response ∏ ∑ 𝜆 𝜎 ′𝑘 ← 𝑆 𝑟 𝑘 ⋅ 𝜎 𝑖𝑣 𝑖 , 𝜇 𝑗,𝑘 ← 𝜆 𝑗,𝑘 + 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 , 𝜋 𝑗,𝑘 ← 𝑒(𝑢 𝑗 𝑗,𝑘 , 𝐻2 ), (𝑖,𝑣 𝑖 )∈𝑄 𝑘 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 ∏𝑠 𝑟𝑘 where 𝜇 𝑘 = {𝜇 𝑗,𝑘 } 𝑗∈[1,𝑠] and 𝜋 𝑘 = 𝑗=1 𝜋 𝑗,𝑘 . Let 𝜂 𝑘 ← 𝑔 ∈ 𝔾, each 𝑃 𝑘 sends 𝜃 𝑘 = (𝜋 𝑘 , 𝜎 ′𝑘 , 𝜇 𝑘 , 𝜂 𝑘 ) to the organizer; 5) Response2(𝑂 → 𝑉 ): After receiving all responses from {𝑃 𝑖 } 𝑖∈[1,𝑐] , the organizer aggregates {𝜃 𝑘 } 𝑃 𝑘 ∈𝒫 into a final response 𝜃 as ∏ ′ ∑ ∏ 𝜎′ ← ( 𝜎 𝑘 ⋅ 𝜂 −𝑠 ) 𝛾 , 𝜇′𝑗 ← 𝑘 𝛾 ⋅ 𝜇 𝑗,𝑘 , 𝜋 ′ ← ( 𝜋 𝑘) 𝛾 . (1) 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 ′ Let 𝜇 = {𝜇′𝑗 } 𝑗∈[1,𝑠] . The organizer sends 𝜃 = (𝜋 , 𝜎 , 𝜇 ) to the verifier.′ ′ ′ http://ieeexploreprojects.blogspot.com Verification: Now the verifier can check whether the response was correctly formed by checking that ? ∏ ∏ 𝑠 𝜇′ 𝜋 ′ ⋅ 𝑒(𝜎 ′ , ℎ) = 𝑒( 𝐻 𝜉(2) (𝜒 𝑖 ) 𝑣 𝑖 , 𝐻1 ) ⋅ 𝑒( ′ 𝑢 𝑗 𝑗 , 𝐻2 ). (2) 𝑘 (𝑖,𝑣 𝑖 )∈𝑄 𝑗=1 a. For 𝜒 𝑖 = “𝐵 𝑖 , 𝑉 𝑖 , 𝑅 𝑖 ” in Section 2.3, we can set 𝜒 𝑖 = (𝐵 𝑖 = 𝑖, 𝑉 𝑖 = 1, 𝑅 𝑖 ∈ 𝑅 {0, 1}∗ ) at initial stage of CPDP scheme.Fig. 3. Cooperative Provable Data Possession for Multi-Cloud Storage.proof protocol between a verifier and more than one value of 𝛾. Therefore, our approach guarantees onlyCSP, in which CSPs need not to interact with each the organizer can compute the final 𝜎 ′ by using 𝛾 andother during the verification process, but an organizer 𝜎 ′𝑘 received from CSPs.is used to organize and manage all CSPs. After 𝜎 ′ is computed, we need to transfer it This protocol can be described as follows: 1) the or- to the organizer in stage of “Response1”. In orderganizer initiates the protocol and sends a commitment to ensure the security of transmission of data tags,to the verifier; 2) the verifier returns a challenge set of our scheme employs a new method, similar to therandom index-coefficient pairs 𝑄 to the organizer; 3) ElGamal encryption, to encrypt the combination of ∏ 𝑣𝑖the organizer relays them into each 𝑃 𝑖 in 𝒫 according tags (𝑖,𝑣 𝑖 )∈𝑄 𝑘 𝜎 𝑖 , that is, for 𝑠𝑘 = 𝑠 ∈ ℤ 𝑝 andto the exact position of each data block; 4) each 𝑃 𝑖 𝑝𝑘 = (𝑔, 𝑆 = 𝑔 𝑠 ) ∈ 𝔾2 , the cipher of message 𝑚returns its response of challenge to the organizer; is 𝒞 = (𝒞1 = 𝑔 𝑟 , 𝒞2 = 𝑚 ⋅ 𝑆 𝑟 ) and its decryption is −𝑠and 5) the organizer synthesizes a final response performed by 𝑚 = 𝒞2 ⋅𝒞1 . Thus, we hold the equationfrom received responses and sends it to the verifier.The above process would guarantee that the verifier ⎛ ⎞𝛾 ⎛ ∏ ⎞𝛾 ∏ 𝜎 ′𝑘 ⎠ ∏ 𝑆 𝑟 𝑘 ⋅ (𝑖,𝑣 )∈𝑄 𝜎 𝑖𝑣 𝑖accesses files without knowing on which CSPs or in 𝜎 ′ = ⎝ =⎝ 𝑖 𝑘 ⎠ 𝑠 𝑠what geographical locations their files reside. 𝑃 𝑘 ∈𝒫 𝜂𝑘 𝑃 𝑘 ∈𝒫 𝜂𝑘 ⎛ ⎞𝛾 In contrast to a single CSP environment, our scheme ∏ ∏ ∏differs from the common PDP scheme in two aspects: = ⎝ ⋅ 𝜎𝑖𝑣 𝑖⎠ = 𝜎 𝑖𝑣 𝑖 ⋅𝛾 . 1) Tag aggregation algorithm: In stage of commit- 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 (𝑖,𝑣 𝑖 )∈𝑄ment, the organizer generates a random 𝛾 ∈ 𝑅 ℤ 𝑝 ′and returns its commitment 𝐻1 to the verifier. This 2) Homomorphic responses: Because of the homo-assures that the verifier and CSPs do not obtain the morphic property, the responses computed from CSPs
  • 8. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 8in a multi-cloud can be combined into a single final 4.1 Collision resistant for index-hash hierarchyresponse as follows: given a∑ of 𝜃 𝑘 = (𝜋 𝑘 , 𝜎 ′𝑘 , 𝜇 𝑘 , 𝜂 𝑘 ) set In our CPDP scheme, the collision resistant of index-received from 𝑃 𝑘 , let 𝜆 𝑗 = 𝑃 𝑘 ∈𝒫 𝜆 𝑗,𝑘 , the organizer hash hierarchy is the basis and prerequisite for thecan compute security of whole scheme, which is described as being ⎛ ⎞ ∑ ∑ ∑ secure in the random oracle model. Although the hash𝜇′𝑗 = 𝛾 ⋅ 𝜇 𝑗,𝑘 = 𝛾 ⋅ ⎝ 𝜆 𝑗,𝑘 + 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 ⎠ function is collision resistant, a successful hash colli- 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 ∑ ∑ ∑ sion can still be used to produce a forged tag when = 𝛾 ⋅ 𝜆 𝑗,𝑘 + 𝛾 ⋅ 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 the same hash value is reused multiple times, e.g., a 𝑃 ∈𝒫 𝑘 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 𝑘 ∑ ∑ legitimate client modifies the data or repeats to insert = 𝛾⋅ 𝜆 𝑗,𝑘 + 𝛾 ⋅ 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 𝑃 𝑘 ∈𝒫 (𝑖,𝑣 𝑖 )∈𝑄 and delete data blocks of outsourced data. To avoid ∑ (3) = 𝛾 ⋅ 𝜆𝑗 + 𝛾 ⋅ 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 . the hash collision, the hash value 𝜉 𝑖,𝑘 , which is used (𝑖,𝑣 𝑖 )∈𝑄 to generate the tag 𝜎 𝑖 in CPDP scheme, is computed from the set of values {𝜏 𝑖 }, 𝐹 𝑛 , 𝐶 𝑘 , {𝜒 𝑖 }. As long as The commitment of 𝜆 𝑗 is also computed by ∏ ∏ ∏𝑠 there exists one bit difference in these data, we can 𝜋′ = ( 𝜋 𝑘) 𝛾 = ( 𝜋 𝑗,𝑘 ) 𝛾 avoid the hash collision. As a consequence, we have 𝑃 𝑘 ∈𝒫 𝑃 𝑘 ∈𝒫 𝑗=1 ∏𝑠 ∏ 𝜆 the following theorem (see Appendix B): = 𝑒(𝑢 𝑗 𝑗,𝑘 , 𝐻2 ) 𝛾 𝑗=1 𝑃 𝑘 ∈𝒫 ∏𝑠 ∑ ∏𝑠 Theorem 1 (Collision Resistant): The index-hash hier- 𝑃 ∈𝒫 𝜆 𝑗,𝑘 𝜆 = 𝑒(𝑢 𝑗 𝑘 , 𝐻2𝛾 ) = ′ 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ). archy in CPDP scheme is collision resistant, even if √ 𝑗=1 𝑗=1 1 the client generates 2𝑝 ⋅ ln 1−𝜀 files with the same It is obvious that the final response 𝜃 received by file name and cloud name, and the client repeats √the verifiers from multiple CSPs is same as that in one 1simple CSP. This means that our CPDP scheme is able 2 𝐿+1 ⋅ ln 1−𝜀 times to modify, insert and delete datato provide a transparent verification for the verifiers. blocks, where the collision probability is at least 𝜀,Two response algorithms, Response1 and Response2, 𝜏 𝑖 ∈ ℤ 𝑝 , and ∣𝑅 𝑖 ∣ = 𝐿 for 𝑅 𝑖 ∈ 𝜒 𝑖 .comprise an HVR: Given two responses 𝜃 𝑖 and 𝜃 𝑗for two challenges 𝑄 𝑖 and 𝑄 𝑗 from two CSPs, i.e., 4.2 Completeness property of verification 𝜃 𝑖 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒1(𝑄 𝑖, {𝑚 𝑘 } 𝑘∈𝐼 𝑖 , {𝜎 𝑘 } 𝑘∈𝐼 𝑖 ), there exists In our scheme, the completeness property implies http://ieeexploreprojects.blogspot.coman efficient algorithm to combine them into a final public verifiability property, which allows anyone, notresponse 𝜃 corresponding to the sum of the challenges just the client (data owner), to challenge the cloud ∪ 𝑄𝑖 𝑄 𝑗 , that is, server for data integrity and data ownership without ( ∪ ) 𝜃 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒1 𝑄𝑖 𝑄 𝑗 , {𝑚 𝑘 } 𝑘∈𝐼 𝑖 ∪ 𝐼 𝑗 , {𝜎 𝑘 } 𝑘∈𝐼 𝑖 ∪ 𝐼𝑗 the need for any secret information. First, for every = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒2(𝜃 𝑖 , 𝜃 𝑗 ). available data-tag pair (𝐹, 𝜎) ∈ 𝑇 𝑎𝑔𝐺𝑒𝑛(𝑠𝑘, 𝐹 ) and a random challenge 𝑄 = (𝑖, 𝑣 𝑖 ) 𝑖∈𝐼 , the verification For multiple CSPs, the above equation can be ex- protocol should be completed with success probabilitytended to 𝜃 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒2({𝜃 𝑘 } 𝑃 𝑘 ∈𝒫 ). More importantly, according to the Equation (3), that is,the HVR is a pair of values 𝜃 = (𝜋, 𝜎, 𝜇), which has aconstant-size even for different challenges. ⎡〈 〉 ⎤ ∑ Pr ⎣ 𝑃 𝑘 (𝐹 , 𝜎 ) ↔ 𝑂 ↔ 𝑉 (𝑝𝑘, 𝜓) = 1⎦ = 1. (𝑘) (𝑘)4 S ECURITY A NALYSIS 𝑃 𝑘 ∈𝒫We give a brief security analysis of our CPDP In this process, anyone can obtain the owner’sconstruction. This construction is directly derived public key 𝑝𝑘 = (𝑔, ℎ, 𝐻1 = ℎ 𝛼 , 𝐻2 = ℎ 𝛽 ) and thefrom multi-prover zero-knowledge proof system (MP- corresponding file parameter 𝜓 = (𝑢, 𝜉 (1) , 𝜒) fromZKPS), which satisfies following properties for a given TTP to execute the verification protocol, hence thisassertion 𝐿: is a public verifiable protocol. Moreover, for different 1) Completeness: whenever 𝑥 ∈ 𝐿, there exists a owners, the secrets 𝛼 and 𝛽 hidden in their publicstrategy for the provers that convinces the verifier that key 𝑝𝑘 are also different, determining that a successthis is the case; verification can only be implemented by the real 2) Soundness: whenever 𝑥 ∕∈ 𝐿, whatever strategy owner’s public key. In addition, the parameter 𝜓 isthe provers employ, they will not convince the verifier used to store the file-related information, so an ownerthat 𝑥 ∈ 𝐿; can employ a unique public key to deal with a large 3) Zero-knowledge: no cheating verifier can learn number of outsourced files.anything other than the veracity of the statement. According to existing IPS research [15], these prop-erties can protect our construction from various at- 4.3 Zero-knowledge property of verificationtacks, such as data leakage attack (privacy leakage), The CPDP construction is in essence a Multi-Provertag forgery attack (ownership cheating), etc. In details, Zero-knowledge Proof (MP-ZKP) system [11], whichthe security of our scheme can be analyzed as follows: can be considered as an extension of the notion of
  • 9. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 9 ∏𝑠 𝜆 ∏ 𝜋 ′ ⋅ 𝑒(𝜎 ′ , ℎ) = ′ 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ) ⋅ 𝑒( 𝜎 𝑖𝑣 𝑖 ⋅𝛾 , ℎ) 𝑗=1 (𝑖,𝑣 𝑖 )∈𝑄 ∏𝑠 𝜆 ′ ∏ (3) ∏𝑠 𝑚 𝑖,𝑗 = 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ) ⋅ 𝑒( ((𝜉 𝑖,𝑘 ) 𝛼 ⋅ ( 𝑢𝑗 ) 𝛽 ) 𝑣 𝑖 ⋅𝛾 , ℎ) 𝑗=1 𝑗=1 (𝑖,𝑣 𝑖 )∈𝑄 ∑ 𝛾𝑚 𝑖,𝑗 𝑣 𝑖 ∏ 𝑠 𝛾⋅𝜆 ∏ (3) ∏𝑠 (𝑖,𝑣 𝑖 )∈𝑄 = 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ) ⋅ 𝑒( (𝜉 𝑖 ) 𝑣 𝑖 , ℎ) 𝛼𝛾 ⋅ 𝑒( 𝑢𝑗 , ℎ 𝛽) 𝑗=1 𝑗=1 (𝑖,𝑣 𝑖 )∈𝑄 ∏ (3) ∏ 𝑠 𝜇′ = 𝑒( (𝜉 𝑖 ) 𝑣 𝑖 , 𝐻1 ) ⋅ ′ 𝑒(𝑢 𝑗 𝑗 , 𝐻2 ). (3) (𝑖,𝑣 𝑖 )∈𝑄 𝑗=1an interactive proof system (IPS). Roughly speak- using reduction to absurdity 1 : we make use of 𝒫 ∗ing, in the scenario of MP-ZKP, a polynomial-time to construct a knowledge extractor ℳ [7,13], whichbounded verifier interacts with several provers whose gets the common input (𝑝𝑘, 𝜓) and rewindable black-computational powers are unlimited. According to a box accesses to the prover 𝑃 ∗ , and then attempts toSimulator model, in which every cheating verifier has break the computational Diffie-Hellman (CDH) prob-a simulator that can produce a transcript that “looks lem in 𝔾: given 𝐺, 𝐺1 = 𝐺 𝑎 , 𝐺2 = 𝐺 𝑏 ∈ 𝑅 𝔾, outputlike” an interaction between a honest prover and a 𝐺 𝑎𝑏 ∈ 𝔾. But it is unacceptable because the CDH prob-cheating verifier, we can prove our CPDP construction lem is widely regarded as an unsolved problem inhas Zero-knowledge property (see Appendix C): polynomial-time. Thus, the opposite direction of the theorem also follows. We have the following theorem Theorem 2 (Zero-Knowledge Property): The verificat- (see Appendix D):ion protocol 𝑃 𝑟𝑜𝑜𝑓 (𝒫, 𝑉 ) in CPDP scheme is a com-putational zero-knowledge system under a simulator Theorem 3 (Knowledge Soundness Property): Our sch-model, that is, for every probabilistic polynomial-time eme has (𝑡, 𝜖′ ) knowledge soundness in random oracleinteractive machine 𝑉 ∗ , there exists a probabilistic and rewindable knowledge extractor model assum-polynomial-time algorithm 𝑆 ∗ such that the ensem- ing the (𝑡, 𝜖)-computational Diffie-Hellman (CDH) as- ∑bles 𝑉 𝑖𝑒𝑤(⟨ 𝑃 𝑘 ∈𝒫 𝑃 𝑘 (𝐹 (𝑘) , 𝜎 (𝑘) ) ↔ 𝑂 ↔ 𝑉 ∗ ⟩(𝑝𝑘, 𝜓)) sumption holds in the group 𝔾 for 𝜖′ ≥ 𝜖.and 𝑆 ∗ (𝑝𝑘, 𝜓) are computationally indistinguishable. http://ieeexploreprojects.blogspot.com Essentially, the soundness means that it is infeasible Zero-knowledge is a property that achieves the to fool the verifier to accept false statements. Often,CSPs’ robustness against attempts to gain knowledge the soundness can also be regarded as a stricter notionby interacting with them. For our construction, we of unforgeability for file tags to avoid cheating themake use of the zero-knowledge property to preserve ownership. This means that the CSPs, even if collusionthe privacy of data blocks and signature tags. Firstly, is attempted, cannot be tampered with the data orrandomness is adopted into the CSPs’ responses in forge the data tags if the soundness property holds.order to resist the data leakage attacks (see Attacks 1 Thus, the Theorem 3 denotes that the CPDP schemeand 3 in Appendix A). That is, the random integer can resist the tag forgery attacks (see Attacks 2 and 4 in 𝜆 𝑗,𝑘 is∑introduced into the response 𝜇 𝑗,𝑘 , i.e., 𝜇 𝑗,𝑘 = Appendix A) to avoid cheating the CSPs’ ownership. 𝜆 𝑗,𝑘 + (𝑖,𝑣 𝑖 )∈𝑄 𝑘 𝑣 𝑖 ⋅ 𝑚 𝑖,𝑗 . This means that the cheatingverifier cannot obtain 𝑚 𝑖,𝑗 from 𝜇 𝑗,𝑘 because he does 5 P ERFORMANCE E VALUATIONnot know the random integer 𝜆 𝑗,𝑘 . At the same time, In this section, to detect abnormality in a low-a random integer 𝛾 is also introduced to randomize overhead and timely manner, we analyze and op- ∏the verification tag 𝜎, i.e., 𝜎 ′ ← ( 𝑃 𝑘 ∈𝒫 𝜎 ′𝑘 ⋅ 𝑅−𝑠 ) 𝛾 . timize the performance of CPDP scheme based on 𝑘Thus, the tag 𝜎 cannot reveal to the cheating verifier the above scheme from two aspects: evaluation ofin terms of randomness. probabilistic queries and optimization of length of blocks. To validate the effects of scheme, we introduce4.4 Knowledge soundness of verification a prototype of CPDP-based audit system and present the experimental results.For every data-tag pairs (𝐹 ∗ , 𝜎 ∗ ) ∕∈ 𝑇 𝑎𝑔𝐺𝑒𝑛(𝑠𝑘, 𝐹 ), inorder to prove nonexistence of fraudulent 𝒫 ∗ and 𝑂∗ ,we require that the scheme satisfies the knowledge 5.1 Performance Analysis for CPDP Schemesoundness property, that is, We present the computation cost of our CPDP scheme ⎡〈 〉 ⎤ in Table 3. We use [𝐸] to denote the computation cost ∑ of an exponent operation in 𝔾, namely, 𝑔 𝑥 , where 𝑥Pr ⎣ 𝑃 𝑘 (𝐹 (𝑘)∗ , 𝜎 (𝑘)∗ ∗ )↔ 𝑂 ↔ 𝑉 (𝑝𝑘, 𝜓) = 1⎦ ≤ 𝜖, is a positive integer in ℤ 𝑝 and 𝑔 ∈ 𝔾 or 𝔾 𝑇 . We ne- 𝑃 𝑘 ∈𝒫 ∗ glect the computation cost of algebraic operations andwhere 𝜖 is a negligible error. We prove that our 1. It is a proof method in which a proposition is proved to bescheme has the knowledge soundness property by true by proving that it is impossible to be false.
  • 10. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 10simple modular arithmetic operations because they 𝑠 means much lower storage. Furthermore, in therun fast enough [16]. The most complex operation is verification protocol, the communication overhead ofthe computation of a bilinear map 𝑒(⋅, ⋅) between two challenge is 2𝑡⋅𝑙0 = 40 ⋅𝑡-Bytes in terms of the numberelliptic points (denoted as [𝐵]). of challenged blocks 𝑡, but its response (response1 or response2) has a constant-size communication over- TABLE 3 head 𝑠 ⋅ 𝑙0 + 𝑙1 + 𝑙 𝑇 ≈ 1.3𝐾-bytes for different file sizes. Comparison of computation overheads between our Also, it implies that client’s communication overheads CPDP scheme and non-cooperative (trivial) scheme. are of a fixed size, which is entirely irrelevant for the CPDP Scheme Trivial Scheme number of CSPs. KeyGen 3[𝐸] 2[E] TagGen (2𝑛 + 𝑠)[𝐸] (2𝑛 + 𝑠)[𝐸] 5.2 Probabilistic Verification Proof(𝒫) 𝑐[𝐵] + (𝑡 + 𝑐𝑠 + 1)[𝐸] 𝑐[𝐵] + (𝑡 + 𝑐𝑠 − 𝑐)[𝐸] Proof(V) 3[𝐵] + (𝑡 + 𝑠)[𝐸] 3𝑐[𝐵] + (𝑡 + 𝑐𝑠)[𝐸] We recall the probabilistic verification of common PDP scheme (which only involves one CSP), in which Then, we analyze the storage and communication the verification process achieves the detection of CSPcosts of our scheme. We define the bilinear pairing server misbehavior in a random sampling mode intakes the form 𝑒 : 𝐸(𝔽 𝑝 𝑚 ) × 𝐸(𝔽 𝑝 𝑘𝑚 ) → 𝔽∗ 𝑘𝑚 (The 𝑝 order to reduce the workload on the server. Thedefinition given here is from [17], [18]), where 𝑝 is a detection probability of disrupted blocks 𝑃 is anprime, 𝑚 is a positive integer, and 𝑘 is the embedding important parameter to guarantee that these blocksdegree (or security multiplier). In this case, we utilize can be detected in time. Assume the CSP modifies 𝑒an asymmetric pairing 𝑒 : 𝔾1 ×𝔾2 → 𝔾 𝑇 to replace the blocks out of the 𝑛-block file, that is, the probability 𝑒symmetric pairing in the original schemes. In Table 3, of disrupted blocks is 𝜌 𝑏 = 𝑛 . Let 𝑡 be the numberit is easy to find that client’s computation overheads of queried blocks for a challenge in the verificationare entirely irrelevant for the number of CSPs. Further, protocol. We have detection probability 2our scheme has better performance compared with 𝑛− 𝑒 𝑡non-cooperative approach due to the total of compu- 𝑃 (𝜌 𝑏 , 𝑡) ≥ 1 − ( ) = 1 − (1 − 𝜌 𝑏 ) 𝑡 , 𝑛tation overheads decrease 3(𝑐 − 1) times bilinear map where, 𝑃 (𝜌 , 𝑡) denotes that the probability 𝑃 is a 𝑏operations, where 𝑐 is the number of clouds in a multi- function over 𝜌 and 𝑡. Hence, the number of queried 𝑏 http://ieeexploreprojects.blogspot.com 𝑃 ⋅𝑛cloud. The reason is that, before the responses are blocks is 𝑡 ≈ log(1−𝜌 𝑏) ≈ 𝑒 for a sufficiently large 𝑛 log(1−𝑃 )sent to the verifier from 𝑐 clouds, the organizer hasaggregate these responses into a response by using and 𝑡 ≪ 𝑛.3 This means that the number of queriedaggregation algorithm, so the verifier only need to blocks 𝑡 is directly proportional to the total numberverify this response once to obtain the final result. of file blocks 𝑛 for the constant 𝑃 and 𝑒. Therefore, for a uniform random verification in a PDP scheme TABLE 4 with fragment structure, given a file with 𝑠𝑧 = 𝑛 ⋅ 𝑠 Comparison of communication overheads between sectors and the probability of sector corruption 𝜌, our CPDP and non-cooperative (trivial) scheme. the detection probability of verification protocol has 𝑃 ≥ 1 − (1 − 𝜌) 𝑠𝑧⋅𝜔 , where 𝜔 denotes the sampling CPDP Scheme Trivial Scheme probability in the verification protocol. We can obtain Commitment 𝑙2 𝑐𝑙2 Challenge1 2𝑡𝑙0 this result as follows: because 𝜌 𝑏 ≥ 1 − (1 − 𝜌) 𝑠 is 2𝑡𝑙0 the probability of block corruption with 𝑠 sectors in Challenge2 2𝑡𝑙0 /𝑐 Response1 𝑠𝑙0 + 2𝑙1 + 𝑙 𝑇 common PDP scheme, the verifier can detect block (𝑠𝑙0 + 𝑙1 + 𝑙 𝑇 )𝑐 Response2 𝑠𝑙0 + 𝑙1 + 𝑙 𝑇 errors with probability 𝑃 ≥ 1 − (1 − 𝜌 𝑏 ) 𝑡 ≥ 1 − ((1 − 𝜌) 𝑠 ) 𝑛⋅𝜔 = 1 − (1 − 𝜌) 𝑠𝑧⋅𝜔 for a challenge with Without loss of generality, let the security param- 𝑡 = 𝑛⋅𝜔 index-coefficient pairs. In the same way, giveneter 𝜅 be 80 bits, we need the elliptic curve domain a multi-cloud 𝒫 = {𝑃 } 𝑖 𝑖∈[1,𝑐] , the detection probabilityparameters over 𝔽 𝑝 with ∣𝑝∣ = 160 bits and 𝑚 = 1 of CPDP scheme hasin our experiments. This means that the length ofinteger is 𝑙0 = 2𝜅 in ℤ 𝑝 . Similarly, we have 𝑙1 = 4𝜅 𝑃 (𝑠𝑧, {𝜌 𝑘 , 𝑟 𝑘 } 𝑃 𝑘 ∈𝒫 , 𝜔) ∏in 𝔾1 , 𝑙2 = 24𝜅 in 𝔾2 , and 𝑙 𝑇 = 24𝜅 in 𝔾 𝕋 for the ≥ 1− ((1 − 𝜌 𝑘 ) 𝑠 ) 𝑛⋅𝑟 𝑘 ⋅𝜔 𝑃 𝑘 ∈𝒫embedding degree 𝑘 = 6. The storage and communi- ∏cation costs of our scheme is shown in Table 4. The = 1− (1 − 𝜌 𝑘 ) 𝑠𝑧⋅𝑟 𝑘 ⋅𝜔 , 𝑃 𝑘 ∈𝒫storage overhead of a file with 𝑠𝑖𝑧𝑒(𝑓 ) = 1𝑀 -bytes is 𝑠𝑡𝑜𝑟𝑒(𝑓 ) = 𝑛 ⋅ 𝑠 ⋅ 𝑙0 + 𝑛 ⋅ 𝑙1 = 1.04𝑀 -bytes for 𝑛 = 103 where 𝑟 𝑘 denotes the proportion of data blocks in theand 𝑠 = 50. The storage overhead of its index table 𝜒 𝑘-th CSP, 𝜌 𝑘 denotes the probability of file corruptionis 𝑛 ⋅ 𝑙0 = 20𝐾-bytes. We define the overhead rate as 𝑒 𝑒 𝑒 2. Exactly, we have 𝑃 = 1 − (1 − 𝑛 ) ⋅ (1 − 𝑛−1 ) ⋅ ⋅ ⋅ (1 − 𝑛−𝑡+1 ). 𝜆 = 𝑠𝑡𝑜𝑟𝑒(𝑓)) − 1 = 𝑠⋅𝑙0 and it should therefore be kept Since 1 − 𝑒 ≥ 1 − 𝑒 for 𝑖 ∈ [0, 𝑡 − 1], we have 𝑃 = 1 − ∏ 𝑖=0 (1 − 𝑠𝑖𝑧𝑒(𝑓 𝑙1 𝑡−1 𝑛 ∏ 𝑛−𝑖 𝑒 𝑡−1as low as possible in order to minimize the storage in 𝑛−𝑖 ) ≥ 1 − 𝑖=0 (1 − 𝑒 ) = 1 − (1 − 𝑒 ) 𝑡 . 𝑛 𝑛cloud storage providers. It is obvious that a higher 3. In terms of (1− 𝑛 ) 𝑡 ≈ 1− 𝑒⋅𝑡 , we have 𝑃 ≈ 1−(1− 𝑒⋅𝑡 ) = 𝑒⋅𝑡 . 𝑒 𝑛 𝑛 𝑛
  • 11. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 11 TABLE 5 The influence of 𝑠, 𝑡 under the different corruption probabilities 𝜌 and the different detection probabilities 𝑃 . 𝒫 {0.1,0.2,0.01} {0.01,0.02,0.001} {0.001,0.002,0.0001} {0.0001,0.0002,0.00001} 𝑟 {0.5,0.3,0.2} {0.5,0.3,0.2} {0.5,0.3,0.2} {0.5,0.3,0.2} 0.8 3/4 7/20 23/62 71/202 0.85 3/5 8/21 26/65 79/214 0.9 3/6 10/20 28/73 87/236 0.95 3/8 11/29 31/86 100/267 0.99 4/10 13/31 39/105 119/345 0.999 5/11 16/38 48/128 146/433in the 𝑘-th CSP, and 𝑟 𝑘 ⋅𝜔 denotes the possible number 5.3 Parameter Optimizationof blocks queried by the verifier in the 𝑘-th CSP. In the fragment structure, the number of sectorsFurthermore, we observe the ratio of queried blocks per block 𝑠 is an important parameter to affect thein the total file blocks 𝑤 under different detection performance of storage services and audit services.probabilities. Based on above analysis, it is easy to Hence, we propose an optimization algorithm for thefind that this ratio holds the equation value of s in this section. Our results show that the log(1 − 𝑃 ) optimal value can not only minimize the computation 𝑤≈ ∑ . and communication overheads, but also reduce the 𝑠𝑧 ⋅ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅ log(1 − 𝜌 𝑘 ) size of extra storage, which is required to store the When this probability 𝜌 𝑘 is a constant probability, verification tags in CSPs.the verifier can detect sever misbehavior with a cer- Assume 𝜌 denotes the probability of sector corrup-tain probability 𝑃 by asking proof for the number of tion. In the fragment structure, the choosing of 𝑠 is ex-blocks 𝑡 ≈ log(1−𝑃 ) for PDP or ˙ 𝑠log(1−𝜌) tremely important for improving the performance of the CPDP scheme. Given the detection probability 𝑃 log(1 − 𝑃 ) and the probability of sector corruption 𝜌 for multiple 𝑡≈ ∑ 𝑠⋅ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅ log(1 − 𝜌 𝑘 ) clouds 𝒫 = {𝑃 𝑘 }, the optimal value of 𝑠 can be com- { } 𝑠𝑧⋅𝑤 puted by min 𝑠∈ℕ ∑ log(1−𝑃 ) 𝑎 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) ⋅ 𝑠 + 𝑏 ⋅ 𝑠 + 𝑐 ,for CPDP, where 𝑡 = 𝑛 ⋅ 𝑤 = 𝑠 .http://ieeexploreprojects.blogspot.com 𝑃 𝑘 ∈𝒫 Note that, the valueof 𝑡 is dependent on the total number of file blocks where 𝑎 ⋅ 𝑡 + 𝑏 ⋅ 𝑠 + 𝑐 denotes the computational cost 𝑛 [2], because it is increased along with the decrease of verification protocol in PDP scheme, 𝑎, 𝑏, 𝑐 ∈ ℝ,of 𝜌 𝑘 and log(1 − 𝜌 𝑘 ) < 0 for the constant number of and 𝑐 is a constant. This conclusion can be obtaineddisrupted blocks 𝑒 and the larger number 𝑛. from following process: Let 𝑠𝑧 = 𝑛 ⋅ 𝑠 = 𝑠𝑖𝑧𝑒(𝑓 )/𝑙0. According to above-mentioned results, the sam- pling probability holds 𝑤 ≥ 𝑠𝑧⋅∑ log(1−𝑃 ) 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) = 300 𝑃 𝑘 ∈𝒫 ∑ log(1−𝑃 ) . In order to minimize the com- 0.800 𝑛⋅𝑠⋅ 𝑃 ∈𝒫 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) 250 0.850 𝑘 0.900 putational cost, we have 0.950 Computational Compexity 200 0.990 0.999 min {𝑎 ⋅ 𝑡 + 𝑏 ⋅ 𝑠 + 𝑐} 𝑠∈ℕ 150 = min {𝑎 ⋅ 𝑛 ⋅ 𝑤 + 𝑏 ⋅ 𝑠 + 𝑐} 𝑠∈ℕ 100 { } log(1 − 𝑃 ) 𝑎 ≥ min ∑ + 𝑏⋅ 𝑠+ 𝑐 . 𝑠∈ℕ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅ log(1 − 𝜌 𝑘 ) 𝑠 50 0 0 10 20 30 40 50 where 𝑟 𝑘 denotes the proportion of data blocks in the The number of sectors in each block 𝑘-th CSP, 𝜌 𝑘 denotes the probability of file corruptionFig. 4. The relationship between computational cost in the 𝑘-th CSP. Since 𝑠𝑎 is a monotone decreasingand the number of sectors in each block. function and 𝑏 ⋅ 𝑠 is a monotone increasing function for 𝑠 > 0, there exists an optimal value of 𝑠 ∈ ℕ in the Another advantage of probabilistic verification above equation. The optimal value of 𝑠 is unrelatedbased on random sampling is that it is easy to identify to a certain file from this conclusion if the probabilitythe tampering or forging data blocks or tags. The iden- 𝜌 is a constant value.tification function is obvious: when the verification For instance, we assume a multi-cloud storagefails, we can choose the partial set of challenge in- involves three CSPs 𝒫 = {𝑃1 , 𝑃2 , 𝑃3 } and thedexes as a new challenge set, and continue to execute probability of sector corruption is a constant valuethe verification protocol. The above search process can {𝜌1 , 𝜌2 , 𝜌3 } = {0.01, 0.02, 0.001}. We set the detectionbe repeatedly executed until the bad block is found. probability 𝑃 with the range from 0.8 to 1, e.g.,The complexity of such a search process is 𝑂(log 𝑛). 𝑃 = {0.8, 0.85, 0.9, 0.95, 0.99, 0.999}. For a file, the
  • 12. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 12Fig. 5. Applying CPDP scheme in Hadoop distributed file system (HDFS).proportion of data blocks is 50%, 30%, and 20% in 5.4 CPDP for Integrity Audit Servicesthree CSPs, respectively, that is, 𝑟1 = 0.5, 𝑟2 = 0.3, and Based on our CPDP scheme, we introduce an audit 𝑟3 = 0.2. In terms of Table 3, the computational cost system architecture for outsourced data in multipleof CSPs can be simplified to 𝑡 + 3𝑠 + 9. Then, we can clouds by replacing the TTP with a third party auditorobserve the computational cost under different 𝑠 and (TPA) in Figure 1. In this architecture, this architecture 𝑃 in Figure 4. When 𝑠 is less than the optimal value, can be constructed into a visualization infrastructurethe computational cost decreases evidently with the of cloud-based storage service [1]. In Figure 5, weincrease of 𝑠, and then it raises when 𝑠 is more than show an example of applying our CPDP scheme inthe optimal value. Hadoop distributed file system (HDFS) 4 , which a distributed, scalable, and portable file system [19]. TABLE 6 HDFS’ architecture is composed of NameNode and http://ieeexploreprojects.blogspot.com The influence of parameters under different detection DataNode, where NameNode maps a file name to probabilities 𝑃 (𝒫 = {𝜌1 , 𝜌2 , 𝜌3 } = {0.01, 0.02, 0.001}, a set of indexes of blocks and DataNode indeed {𝑟1 , 𝑟2 , 𝑟3 } = {0.5, 0.3, 0.2}). stores data blocks. To support our CPDP scheme, the index-hash hierarchy and the metadata of NameNode P 0.8 0.85 0.9 0.95 0.99 0.999 should be integrated together to provide an enquiry 𝑠𝑧 ⋅ 𝑤 142.60 168.09 204.02 265.43 408.04 612.06 (3) 𝑠 7 8 10 11 13 16 service for the hash value 𝜉 𝑖,𝑘 or index-hash record 𝜒 𝑖 . 𝑡 20 21 20 29 31 38 Based on the hash value, the clients can implement the verification protocol via CPDP services. Hence, it is easy to replace the checksum methods with the CPDP More accurately, we show the influence of parame- scheme for anomaly detection in current HDFS.ters, 𝑠𝑧 ⋅𝑤, 𝑠, and 𝑡, under different detection probabil- To validate the effectiveness and efficiency of ourities in Table 6. It is easy to see that computational cost proposed approach for audit services, we have imple-raises with the increase of 𝑃 . Moreover, we can make mented a prototype of an audit system. We simulatedsure the sampling number of challenge with following the audit service and the storage service by using twoconclusion: Given the detection probability 𝑃 , the local IBM servers with two Intel Core 2 processors atprobability of sector corruption 𝜌, and the number 2.16 GHz and 500M RAM running Windows Serverof sectors in each block 𝑠, the sampling number of 2003. These servers were connected via 250 MB/sec ofverification protocol are a constant 𝑡 = 𝑛 ⋅ 𝑤 ≥ network bandwidth. Using GMP and PBC libraries, ∑ log(1−𝑃 ) for different files. 𝑠⋅ 𝑃 𝑘 ∈𝒫 𝑟 𝑘 ⋅log(1−𝜌 𝑘 ) we have implemented a cryptographic library upon Finally, we observe the change of 𝑠 under different which our scheme can be constructed. This C library 𝜌 and 𝑃 . The experimental results are shown in Table contains approximately 5,200 lines of codes and has5. It is obvious that the optimal value of 𝑠 raises with been tested on both Windows and Linux platforms.increase of 𝑃 and with the decrease of 𝜌. We choose The elliptic curve utilized in the experiment is athe optimal value of 𝑠 on the basis of practical settings MNT curve, with base field size of 160 bits and theand system requisition. For NTFS format, we suggest embedding degree 6. The security level is chosen tothat the value of 𝑠 is 200 and the size of block is 4K- be 80 bits, which means ∣𝑝∣ = 160.Bytes, which is the same as the default size of clusterwhen the file size is less than 16TB in NTFS. In this 4. Hadoop can enable applications to work with thousands of nodes and petabytes of data, and it has been adopted by currentlycase, the value of 𝑠 ensures that the extra storage mainstream cloud platforms from Apache, Google, Yahoo, Amazon,doesn’t exceed 1% in storage servers. IBM and Sun.
  • 13. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 13 Computation and communication costs. (s) 180 100 Computation and Communcation costs. (s) 150 ratio=50% ratio=40% ratio=30% 10 120 ratio=20% Commitment ratio=10% Challenge1 Challenge2(CSP2) Challenge2(CSP3) 90 Response1(CSP2) 1 Response1(CSP3) Response1(CSP1) Response2 Verification 60 Total Time 0.1 30 0 0.01 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 (s=20) (s=50) (s=100) (s=250) The ratio of queried blocks for total file blocks. (%) The size of files. (K-Bytes) (three CSP, r=(50%,30%,20%), 10M-Bytes, 250 sectors/blocks)Fig. 6. Experimental results under different file size, sampling ratio, and sector number. Firstly, we quantify the performance of our audit Based on homomorphic verifiable response and hashscheme under different parameters, such as file size index hierarchy, we have proposed a cooperative PDP 𝑠𝑧, sampling ratio 𝑤, sector number per block 𝑠, scheme to support dynamic scalability on multipleand so on. Our analysis shows that the value of 𝑠 storage servers. We also showed that our schemeshould grow with the increase of 𝑠𝑧 in order to reduce provided all security properties required by zero-computation and communication costs. Thus, our ex- knowledge interactive proof system, so that it canperiments were carried out as follows: the stored files resist various attacks even if it is deployed as a publicwere chosen from 10KB to 10MB; the sector numbers audit service in clouds. Furthermore, we optimizedwere changed from 20 to 250 in terms of file sizes; and the probabilistic query and periodic verification to im-the sampling ratios were changed from 10% to 50%. prove the audit performance. Our experiments clearlyThe experimental results are shown in the left side of demonstrated that our approaches only introduce aFigure 6. These results dictate that the computation small amount of computation and communication http://ieeexploreprojects.blogspot.comand communication costs (including I/O costs) grow overheads. Therefore, our solution can be treated aswith the increase of file size and sampling ratio. a new candidate for data integrity verification in Next, we compare the performance of each activity outsourcing data storage systems.in our verification protocol. We have shown the the- As part of future work, we would extend ouroretical results in Table 4: the overheads of “commit- work to explore more effective CPDP constructions.ment” and “challenge” resemble one another, and the First, from our experiments we found that the per-overheads of “response” and “verification” resemble formance of CPDP scheme, especially for large files,one another as well. To validate the theoretical results, is affected by the bilinear mapping operations duewe changed the sampling ratio 𝑤 from 10% to 50% for to its high complexity. To solve this problem, RSA-a 10MB file and 250 sectors per block in a multi-cloud based constructions may be a better choice, but this 𝒫 = {𝑃1 , 𝑃2 , 𝑃3 }, in which the proportions of data is still a challenging task because the existing RSA-blocks are 50%, 30%, and 20% in three CSPs, respec- based schemes have too many restrictions on thetively. In the right side of Figure 6, our experimental performance and security [2]. Next, from a practicalresults show that the computation and communi- point of view, we still need to address some issuescation costs of “commitment” and “challenge” are about integrating our CPDP scheme smoothly withslightly changed along with the sampling ratio, but existing systems, for example, how to match index-those for “response” and “verification” grow with the hash hierarchy with HDFS’s two-layer name space,increase of the sampling ratio. Here, “challenge” and how to match index structure with cluster-network“response” can be divided into two sub-processes: model, and how to dynamically update the CPDP“challenge1” and “challenge2”, as well as “response1” parameters according to HDFS’ specific requirements.and “response2”, respectively. Furthermore, the pro- Finally, it is still a challenging problem for the gener-portions of data blocks in each CSP have greater ation of tags with the length irrelevant to the size ofinfluence on the computation costs of “challenge” and data blocks. We would explore such a issue to provide“response” processes. In summary, our scheme has the support of variable-length block verification.better performance than non-cooperative approach. ACKNOWLEDGMENTS6 C ONCLUSIONS The work of Y. Zhu and M. Yu was supported by theIn this paper, we presented the construction of an National Natural Science Foundation of China (Projectefficient PDP scheme for distributed cloud storage. No.61170264 and No.10990011). This work of Gail-J.
  • 14. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 14Ahn and Hongxin Hu was partially supported by the [18] H. Hu, L. Hu, and D. Feng, “On a class of pseudorandomgrants from US National Science Foundation (NSF- sequences from elliptic curves over finite fields,” IEEE Trans- actions on Information Theory, vol. 53, no. 7, pp. 2598–2605, 2007.IIS-0900970 and NSF-CNS-0831360) and Department [19] A. Bialecki, M. Cafarella, D. Cutting, and O. O’Malley,of Energy (DE-SC0004308). “Hadoop: A framework for running applications on large clusters built of commodity hardware,” Tech. Rep., 2005. [Online]. Available: http://lucene.apache.org/hadoop/ [20] E. Al-Shaer, S. Jha, and A. D. Keromytis, Eds., Proceedings of theR EFERENCES 2009 ACM Conference on Computer and Communications Security, CCS 2009, Chicago, Illinois, USA, November 9-13, 2009. ACM,[1] B. Sotomayor, R. S. Montero, I. M. Llorente, and I. T. Foster, 2009. “Virtual infrastructure management in private and hybrid clouds,” IEEE Internet Computing, vol. 13, no. 5, pp. 14–22, 2009.[2] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Yan Zhu received the Ph.D. degree in com- Z. N. J. Peterson, and D. X. Song, “Provable data possession puter science from Harbin Engineering Uni- at untrusted stores,” in ACM Conference on Computer and versity, China, in 2005. He was an associate Communications Security, P. Ning, S. D. C. di Vimercati, and professor of computer science in the Insti- P. F. Syverson, Eds. ACM, 2007, pp. 598–609. tute of Computer Science and Technology[3] A. Juels and B. S. K. Jr., “Pors: proofs of retrievability for at Peking University since 2007. He worked large files,” in ACM Conference on Computer and Communications at the Department of Computer Science and Security, P. Ning, S. D. C. di Vimercati, and P. F. Syverson, Eds. Engineering, Arizona State University as a ACM, 2007, pp. 584–597. visiting associate professor from 2008 to[4] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scal- 2009. His research interests include cryptog- able and efficient provable data possession,” in Proceedings raphy and network security. of the 4th international conference on Security and privacy in communication netowrks, SecureComm, 2008, pp. 1–10.[5] C. C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, ¨ ¸¨ Hongxin Hu is currently working toward the “Dynamic provable data possession,” in ACM Conference on Ph.D. degree from the School of Computing, Computer and Communications Security, E. Al-Shaer, S. Jha, and Informatics, and Decision Systems Engineer- A. D. Keromytis, Eds. ACM, 2009, pp. 213–222. ing, Ira A. Fulton School of Engineering, Ari-[6] H. Shacham and B. Waters, “Compact proofs of retrievabil- zona State University. He is also a member ity,” in ASIACRYPT, ser. Lecture Notes in Computer Science, of the Security Engineering for Future Com- J. Pieprzyk, Ed., vol. 5350. Springer, 2008, pp. 90–107. puting Laboratory, Arizona State University.[7] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, “Enabling public His current research interests include access verifiability and data dynamics for storage security in cloud control models and mechanisms, security computing,” in ESORICS, ser. Lecture Notes in Computer and privacy in social networks, and security http://ieeexploreprojects.blogspot.com Science, M. Backes and P. Ning, Eds., vol. 5789. Springer, in distributed and cloud computing, network 2009, pp. 355–370. and system security and secure software engineering.[8] Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. S. Yau, “Dy- namic audit services for integrity verification of outsourced storages in clouds,” in SAC, W. C. Chu, W. E. Wong, M. J. Gail-Joon Ahn is an Associate Professor in Palakal, and C.-C. Hung, Eds. ACM, 2011, pp. 1550–1557. the School of Computing, Informatics, and[9] K. D. Bowers, A. Juels, and A. Oprea, “Hail: a high-availability Decision Systems Engineering, Ira A. Ful- and integrity layer for cloud storage,” in ACM Conference on ton Schools of Engineering and the Director Computer and Communications Security, E. Al-Shaer, S. Jha, and of Security Engineering for Future Comput- A. D. Keromytis, Eds. ACM, 2009, pp. 187–198. ing Laboratory, Arizona State University. His[10] Y. Dodis, S. P. Vadhan, and D. Wichs, “Proofs of retrievability research interests include information and via hardness amplification,” in TCC, ser. Lecture Notes in systems security, vulnerability and risk man- Computer Science, O. Reingold, Ed., vol. 5444. Springer, 2009, agement, access control, and security ar- pp. 109–127. chitecture for distributed systems, which has[11] L. Fortnow, J. Rompel, and M. Sipser, “On the power of multi- been supported by the U.S. National Science prover interactive protocols,” in Theoretical Computer Science, Foundation, National Security Agency, U.S. Department of Defense, 1988, pp. 156–161. U.S. Department of Energy, Bank of America, Hewlett Packard,[12] Y. Zhu, H. Hu, G.-J. Ahn, Y. Han, and S. Chen, “Collaborative Microsoft, and Robert Wood Johnson Foundation. Dr. Ahn is a integrity verification in hybrid clouds,” in IEEE Conference on recipient of the U.S. Department of Energy CAREER Award and the the 7th International Conference on Collaborative Computing: Net- Educator of the Year Award from the Federal Information Systems working, Applications and Worksharing, CollaborateCom, Orlando, Security Educators Association. He was an Associate Professor at Florida, USA, October 15-18, 2011, pp. 197–206. the College of Computing and Informatics, and the Founding Director[13] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, of the Center for Digital Identity and Cyber Defense Research and A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and Laboratory of Information Integration, Security, and Privacy, Univer- M. Zaharia, “Above the clouds: A berkeley view of cloud com- sity of North Carolina, Charlotte. He received the Ph.D. degree in puting,” EECS Department, University of California, Berkeley, information technology from George Mason University, Fairfax, VA, Tech. Rep., Feb 2009. in 2000.[14] D. Boneh and M. Franklin, “Identity-based encryption from the weil pairing,” in Advances in Cryptology (CRYPTO’2001), vol. 2139 of LNCS, 2001, pp. 213–229.[15] O. Goldreich, Foundations of Cryptography: Basic Tools. Cam- Mengyang Yu received his B.S. degree from bridge University Press, 2001. the School of Mathematics Science, Peking[16] P. S. L. M. Barreto, S. D. Galbraith, C. O’Eigeartaigh, and University in 2010. He is currently a M.S. M. Scott, “Efficient pairing computation on supersingular candidate in Peking University. His research abelian varieties,” Des. Codes Cryptography, vol. 42, no. 3, pp. interests include cryptography and computer 239–271, 2007. security.[17] J.-L. Beuchat, N. Brisebarre, J. Detrey, and E. Okamoto, “Arith- metic operators for pairing-based cryptography,” in CHES, ser. Lecture Notes in Computer Science, P. Paillier and I. Ver- bauwhede, Eds., vol. 4727. Springer, 2007, pp. 239–255.

×