SlideShare a Scribd company logo
1 of 18
PerfectDedup
Secure Data Deduplication
Pasquale PUZIO
pasquale@secludit.com
SecludIT & EURECOM
Refik Molva (EURECOM)
Melek Önen (EURECOM)
Sergio Loureiro (SecludIT)
10th DPM International Workshop on Data Privacy Management
Vienna, Austria, September 21st 2015
Agenda
• Problem Statement
– Data Deduplication for Cloud Storage
– Convergent Encryption
• Our solution
– Data Popularity
– Perfect Hashing
– PerfectDedup: Secure Popularity Detection
– Security
– Performance Evaluation
2
Deduplication
• Storing duplicate data only once
• Cross-user + Client-side + Block-level
3
Deduplication vs Encryption
… but it does not work on encrypted data!
D = Hello
World
D = Hello
World
ENCRYPTION with K1 ENCRYPTION with K2
owhfgr0wgr[w
hfrw0[h0[ergh
e0[gh0[eg
dfjl;dbfrwbfirbf
roepthwobgfr
ugtwertgrtwu
4
Convergent Encryption
• Data Encryption key derived from Data
K = hash(Data)
• Deterministic & Symmetric Encryption
D = Hello
World
D = Hello
World
ENCRYPTION with H(D) ENCRYPTION with H(D)
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
5
Douceur, John R., et al. "Reclaiming space from duplicate files in a serverless distributed file system." Distributed Computing Systems, 2002.
Proceedings. 22nd International Conference on. IEEE, 2002.
Convergent Encryption
MISSING
INFORMATION
How to achieve safe
Convergent Encryption
in the Cloud ?
6
Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20
https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
Data Popularity
• Different protection based on data-segment
popularity
• Popular data  Not confidential  To be
deduplicated  Convergent Encryption
• Unpopular data  Confidential  To be
protected  Semantically-Secure Encryption
7
Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg,
2014. 99-118.
How to securely detect popularity ?
CSP
.
.
.
B
.
.
.
Is block B popular ?
YES / NO
• Block B must not be disclosed if it is unpopular (sensitive)
CLIENT
8
PHF-based Lookup
9
ID
Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. "Hash, displace, and compress." Algorithms-ESA 2009. Springer Berlin
Heidelberg, 2009. 682-693.
PerfectDedup
• Based on «Secure» Perfect Hashing
– One-wayness
• Popular block IDs  Collision-free hash
function (PHF)
• BENEFITS:
– Efficient (linear) generation of a new PHF
(outsourced to the Cloud)
– Compact representation of PHF
– Very efficient (constant) evaluation on a block ID
10
Security
UNPOPULAR
P
POPULAR
P
CSP
.
.
.
.
.
.
PHF(ID) = i
i ID
Block is popular
1-to-1 mapping
No confidentiality issue
11
Security
UNPOPULAR
P
POPULAR
P
CSP
.
.
.
.
.
.
PHF(ID) = i
i ID’
Block is unpopular
Collisions are well-distributed
One-wayness property
12
PerfectDedup
CSP
.
.
.
B
.
.
.
Is block B popular ?
YES / NO
INDEX
SERVICE
If NO
POPULARITY
TRANSITION ? YES / NO
CLIENT
13
Prototype Implementation
CSP
INDEX SERVICE
CMPH
CMPH
CLIENT
14
Performance Evaluation
0
1
2
3
4
5
6
7
8
9
10
UNPOPULAR FILE POPULARITY TRANSITION POPULAR FILE
Time(inseconds)
Scenario
Client File Split Client Convergent Encryption
Client Popularity Check Client Symmetric Encryption
Idx Service Update Cloud Generate PHF
Cloud Store Hash Table Cloud Popularity Check
Cloud Upload Processing
15
Conclusions
• Popularity-based Deduplication
• Secure Perfect Hashing
• Secure & Lightweight for the client
• Costly tasks outsourced to the Cloud
• Low overhead
16
Future Work
• Optimization of PHF generation
• Deployment in real production environments
17
THANK YOU
Questions ?
Don’t be shy !
pasquale@secludit.com

More Related Content

Viewers also liked

Internet of Things
Internet of ThingsInternet of Things
Internet of Things
Vala Afshar
 

Viewers also liked (8)

Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloud
 
Deduplication
DeduplicationDeduplication
Deduplication
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
 
Deduplication in Open Spurce Cloud
Deduplication in Open Spurce CloudDeduplication in Open Spurce Cloud
Deduplication in Open Spurce Cloud
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 

Similar to [DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage

Ryan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_PresentationRyan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_Presentation
Ryan Holt
 
Emerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudEmerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for Cloud
Ulf Mattsson
 

Similar to [DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage (20)

Zero-Knowledge Proofs: Identity Proofing and Authentication
Zero-Knowledge Proofs: Identity Proofing and AuthenticationZero-Knowledge Proofs: Identity Proofing and Authentication
Zero-Knowledge Proofs: Identity Proofing and Authentication
 
FRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHYFRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHY
 
Access Control & Encryption In Cloud Environments
Access Control & Encryption In Cloud EnvironmentsAccess Control & Encryption In Cloud Environments
Access Control & Encryption In Cloud Environments
 
Improving privacy in blockchain using homomorphic encryption
Improving privacy in blockchain using homomorphic encryption Improving privacy in blockchain using homomorphic encryption
Improving privacy in blockchain using homomorphic encryption
 
Ryan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_PresentationRyan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_Presentation
 
Splunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdfSplunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdf
 
Forecast 2012 Panel: Cloud Security Christofer Hoff
Forecast 2012 Panel: Cloud Security Christofer HoffForecast 2012 Panel: Cloud Security Christofer Hoff
Forecast 2012 Panel: Cloud Security Christofer Hoff
 
doc1.pdf
doc1.pdfdoc1.pdf
doc1.pdf
 
sheet1.pdf
sheet1.pdfsheet1.pdf
sheet1.pdf
 
lecture7.pdf
lecture7.pdflecture7.pdf
lecture7.pdf
 
paper1.pdf
paper1.pdfpaper1.pdf
paper1.pdf
 
paper8.pdf
paper8.pdfpaper8.pdf
paper8.pdf
 
Emerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudEmerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for Cloud
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
Puzzle Lock
Puzzle LockPuzzle Lock
Puzzle Lock
 
Berlin 6 Open Access Conference: Christian Zier
Berlin 6 Open Access Conference: Christian ZierBerlin 6 Open Access Conference: Christian Zier
Berlin 6 Open Access Conference: Christian Zier
 
DSS ITSEC 2013 Conference 07.11.2013 - HeadTechnology - IT security trends 2014
DSS ITSEC 2013 Conference 07.11.2013  - HeadTechnology - IT security trends 2014DSS ITSEC 2013 Conference 07.11.2013  - HeadTechnology - IT security trends 2014
DSS ITSEC 2013 Conference 07.11.2013 - HeadTechnology - IT security trends 2014
 
Zero-Knowledge Proofs in Light of Digital Identity
Zero-Knowledge Proofs in Light of Digital IdentityZero-Knowledge Proofs in Light of Digital Identity
Zero-Knowledge Proofs in Light of Digital Identity
 
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage

  • 1. PerfectDedup Secure Data Deduplication Pasquale PUZIO pasquale@secludit.com SecludIT & EURECOM Refik Molva (EURECOM) Melek Önen (EURECOM) Sergio Loureiro (SecludIT) 10th DPM International Workshop on Data Privacy Management Vienna, Austria, September 21st 2015
  • 2. Agenda • Problem Statement – Data Deduplication for Cloud Storage – Convergent Encryption • Our solution – Data Popularity – Perfect Hashing – PerfectDedup: Secure Popularity Detection – Security – Performance Evaluation 2
  • 3. Deduplication • Storing duplicate data only once • Cross-user + Client-side + Block-level 3
  • 4. Deduplication vs Encryption … but it does not work on encrypted data! D = Hello World D = Hello World ENCRYPTION with K1 ENCRYPTION with K2 owhfgr0wgr[w hfrw0[h0[ergh e0[gh0[eg dfjl;dbfrwbfirbf roepthwobgfr ugtwertgrtwu 4
  • 5. Convergent Encryption • Data Encryption key derived from Data K = hash(Data) • Deterministic & Symmetric Encryption D = Hello World D = Hello World ENCRYPTION with H(D) ENCRYPTION with H(D) klfgwilegfiorw egtriegtiergiei ergriegrigfifiw klfgwilegfiorw egtriegtiergiei ergriegrigfifiw 5 Douceur, John R., et al. "Reclaiming space from duplicate files in a serverless distributed file system." Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on. IEEE, 2002.
  • 6. Convergent Encryption MISSING INFORMATION How to achieve safe Convergent Encryption in the Cloud ? 6 Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20 https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
  • 7. Data Popularity • Different protection based on data-segment popularity • Popular data  Not confidential  To be deduplicated  Convergent Encryption • Unpopular data  Confidential  To be protected  Semantically-Secure Encryption 7 Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg, 2014. 99-118.
  • 8. How to securely detect popularity ? CSP . . . B . . . Is block B popular ? YES / NO • Block B must not be disclosed if it is unpopular (sensitive) CLIENT 8
  • 9. PHF-based Lookup 9 ID Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. "Hash, displace, and compress." Algorithms-ESA 2009. Springer Berlin Heidelberg, 2009. 682-693.
  • 10. PerfectDedup • Based on «Secure» Perfect Hashing – One-wayness • Popular block IDs  Collision-free hash function (PHF) • BENEFITS: – Efficient (linear) generation of a new PHF (outsourced to the Cloud) – Compact representation of PHF – Very efficient (constant) evaluation on a block ID 10
  • 11. Security UNPOPULAR P POPULAR P CSP . . . . . . PHF(ID) = i i ID Block is popular 1-to-1 mapping No confidentiality issue 11
  • 12. Security UNPOPULAR P POPULAR P CSP . . . . . . PHF(ID) = i i ID’ Block is unpopular Collisions are well-distributed One-wayness property 12
  • 13. PerfectDedup CSP . . . B . . . Is block B popular ? YES / NO INDEX SERVICE If NO POPULARITY TRANSITION ? YES / NO CLIENT 13
  • 15. Performance Evaluation 0 1 2 3 4 5 6 7 8 9 10 UNPOPULAR FILE POPULARITY TRANSITION POPULAR FILE Time(inseconds) Scenario Client File Split Client Convergent Encryption Client Popularity Check Client Symmetric Encryption Idx Service Update Cloud Generate PHF Cloud Store Hash Table Cloud Popularity Check Cloud Upload Processing 15
  • 16. Conclusions • Popularity-based Deduplication • Secure Perfect Hashing • Secure & Lightweight for the client • Costly tasks outsourced to the Cloud • Low overhead 16
  • 17. Future Work • Optimization of PHF generation • Deployment in real production environments 17
  • 18. THANK YOU Questions ? Don’t be shy ! pasquale@secludit.com

Editor's Notes

  1. Hello everyone, my name’s Pasquale Puzio. I’m a PhD student at EURECOM & SecludIT under the supervision of Refik MOLVA and Sergio LOUREIRO. Today I’m gonna talk about PerfectDedup, which is our last work on secure data deduplication Data Deduplication + Confidentiality
  2. Let’s talk quickly about the agenda. Today I’ll first explain what data deduplication is and why it became interesting for researchers. Then I’ll explain how deduplication can be combined with encryption, in particular convergent encryption. This will bring me to the vulnerabilities of CE. Finally I’ll present our solution based on data popularity and perfect hashing.
  3. Basic idea: store duplicated data only once Explain Mention experiments
  4. Key and encryption are deterministic
  5. Researchers noticed that data may need different levels of protection depending on its popularity This assumption works pretty well in all common scenarios, except for a few extreme cases However in our scheme the user can skip the protocol and just encrypt his file Explain when a block becomes popular -> popularity threshold is reached Mention an example
  6. The problem is shifted to secure popularity detection: if popular do this, if unpopular do that PIR would not be efficient in the case of block-level deduplication Explain that different encryption requires the user to know if data is popular Simple solution -> look for convergent encrypted block -> not secure
  7. Let’s go into more detail Index does not reveal anything on the block because of collisions Lookup protocols use hash tables, databases use perfect hashing based indices, we need a secure lookup protocol
  8. We decided to design a new protocol based on perfect hashing Secure because we added the one-wayness property which is foundamental for the security of the protocol
  9. No confidentiality issue because block is popular
  10. On the other hand, collisions protect unpopular data Several pre-images corresponding to the same image
  11. Now let’s have a closer look at the architecture We need a trusted index service in order to handle the popularity transition, that is that phase in which a block that was unpopular becomes popular after reaching a popularity threshold Explain protocol
  12. Focus on CMPH Mention that we modified CMPH in order to make it secure (one-way)
  13. Upload of a 10MB file in three different scenarios: file was unpopular, triggered a popularity transition, was popular The take-away from this slide is that all client operations are really lightweight Example: popularity check -> outperforms PIR by far Costly operations are outsourced to the cloud Fix colors
  14. Outperforms the previous existing solutions