SlideShare a Scribd company logo
PerfectDedup
Secure Data Deduplication
Pasquale PUZIO
pasquale@secludit.com
SecludIT & EURECOM
Refik Molva (EURECOM)
Melek Önen (EURECOM)
Sergio Loureiro (SecludIT)
10th DPM International Workshop on Data Privacy Management
Vienna, Austria, September 21st 2015
Agenda
• Problem Statement
– Data Deduplication for Cloud Storage
– Convergent Encryption
• Our solution
– Data Popularity
– Perfect Hashing
– PerfectDedup: Secure Popularity Detection
– Security
– Performance Evaluation
2
Deduplication
• Storing duplicate data only once
• Cross-user + Client-side + Block-level
3
Deduplication vs Encryption
… but it does not work on encrypted data!
D = Hello
World
D = Hello
World
ENCRYPTION with K1 ENCRYPTION with K2
owhfgr0wgr[w
hfrw0[h0[ergh
e0[gh0[eg
dfjl;dbfrwbfirbf
roepthwobgfr
ugtwertgrtwu
4
Convergent Encryption
• Data Encryption key derived from Data
K = hash(Data)
• Deterministic & Symmetric Encryption
D = Hello
World
D = Hello
World
ENCRYPTION with H(D) ENCRYPTION with H(D)
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
klfgwilegfiorw
egtriegtiergiei
ergriegrigfifiw
5
Douceur, John R., et al. "Reclaiming space from duplicate files in a serverless distributed file system." Distributed Computing Systems, 2002.
Proceedings. 22nd International Conference on. IEEE, 2002.
Convergent Encryption
MISSING
INFORMATION
How to achieve safe
Convergent Encryption
in the Cloud ?
6
Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20
https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
Data Popularity
• Different protection based on data-segment
popularity
• Popular data  Not confidential  To be
deduplicated  Convergent Encryption
• Unpopular data  Confidential  To be
protected  Semantically-Secure Encryption
7
Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg,
2014. 99-118.
How to securely detect popularity ?
CSP
.
.
.
B
.
.
.
Is block B popular ?
YES / NO
• Block B must not be disclosed if it is unpopular (sensitive)
CLIENT
8
PHF-based Lookup
9
ID
Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. "Hash, displace, and compress." Algorithms-ESA 2009. Springer Berlin
Heidelberg, 2009. 682-693.
PerfectDedup
• Based on «Secure» Perfect Hashing
– One-wayness
• Popular block IDs  Collision-free hash
function (PHF)
• BENEFITS:
– Efficient (linear) generation of a new PHF
(outsourced to the Cloud)
– Compact representation of PHF
– Very efficient (constant) evaluation on a block ID
10
Security
UNPOPULAR
P
POPULAR
P
CSP
.
.
.
.
.
.
PHF(ID) = i
i ID
Block is popular
1-to-1 mapping
No confidentiality issue
11
Security
UNPOPULAR
P
POPULAR
P
CSP
.
.
.
.
.
.
PHF(ID) = i
i ID’
Block is unpopular
Collisions are well-distributed
One-wayness property
12
PerfectDedup
CSP
.
.
.
B
.
.
.
Is block B popular ?
YES / NO
INDEX
SERVICE
If NO
POPULARITY
TRANSITION ? YES / NO
CLIENT
13
Prototype Implementation
CSP
INDEX SERVICE
CMPH
CMPH
CLIENT
14
Performance Evaluation
0
1
2
3
4
5
6
7
8
9
10
UNPOPULAR FILE POPULARITY TRANSITION POPULAR FILE
Time(inseconds)
Scenario
Client File Split Client Convergent Encryption
Client Popularity Check Client Symmetric Encryption
Idx Service Update Cloud Generate PHF
Cloud Store Hash Table Cloud Popularity Check
Cloud Upload Processing
15
Conclusions
• Popularity-based Deduplication
• Secure Perfect Hashing
• Secure & Lightweight for the client
• Costly tasks outsourced to the Cloud
• Low overhead
16
Future Work
• Optimization of PHF generation
• Deployment in real production environments
17
THANK YOU
Questions ?
Don’t be shy !
pasquale@secludit.com

More Related Content

Viewers also liked

Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloud
nexgentech15
 
Deduplication
DeduplicationDeduplication
Deduplication
Lars Marius Garshol
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
SWAMI06
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
emcbaltics
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
Pasquale Puzio
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Mohan Kumar G
 
Internet of Things
Internet of ThingsInternet of Things
Internet of ThingsVala Afshar
 

Viewers also liked (8)

Secure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloudSecure auditing and deduplicating data in cloud
Secure auditing and deduplicating data in cloud
 
Deduplication
DeduplicationDeduplication
Deduplication
 
A Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized DeduplicationA Hybrid Cloud Approach for Secure Authorized Deduplication
A Hybrid Cloud Approach for Secure Authorized Deduplication
 
Deduplication in Open Spurce Cloud
Deduplication in Open Spurce CloudDeduplication in Open Spurce Cloud
Deduplication in Open Spurce Cloud
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 

Similar to [DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage

Zero-Knowledge Proofs: Identity Proofing and Authentication
Zero-Knowledge Proofs: Identity Proofing and AuthenticationZero-Knowledge Proofs: Identity Proofing and Authentication
Zero-Knowledge Proofs: Identity Proofing and Authentication
Clare Nelson, CISSP, CIPP-E
 
FRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHYFRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHY
LINE Corporation
 
Access Control & Encryption In Cloud Environments
Access Control & Encryption In Cloud EnvironmentsAccess Control & Encryption In Cloud Environments
Access Control & Encryption In Cloud Environments
James Wernicke
 
Improving privacy in blockchain using homomorphic encryption
Improving privacy in blockchain using homomorphic encryption Improving privacy in blockchain using homomorphic encryption
Improving privacy in blockchain using homomorphic encryption
Razi Rais
 
Ryan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_PresentationRyan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_PresentationRyan Holt
 
Splunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdfSplunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdf
Amanda Richardson
 
Forecast 2012 Panel: Cloud Security Christofer Hoff
Forecast 2012 Panel: Cloud Security Christofer HoffForecast 2012 Panel: Cloud Security Christofer Hoff
Forecast 2012 Panel: Cloud Security Christofer Hoff
Open Data Center Alliance
 
doc1.pdf
doc1.pdfdoc1.pdf
doc1.pdf
aminasouyah
 
sheet1.pdf
sheet1.pdfsheet1.pdf
sheet1.pdf
aminasouyah
 
lecture7.pdf
lecture7.pdflecture7.pdf
lecture7.pdf
aminasouyah
 
paper1.pdf
paper1.pdfpaper1.pdf
paper1.pdf
aminasouyah
 
paper8.pdf
paper8.pdfpaper8.pdf
paper8.pdf
aminasouyah
 
Emerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudEmerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for Cloud
Ulf Mattsson
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
Allen Day, PhD
 
Berlin 6 Open Access Conference: Christian Zier
Berlin 6 Open Access Conference: Christian ZierBerlin 6 Open Access Conference: Christian Zier
Berlin 6 Open Access Conference: Christian Zier
Cornelius Puschmann
 
DSS ITSEC 2013 Conference 07.11.2013 - HeadTechnology - IT security trends 2014
DSS ITSEC 2013 Conference 07.11.2013  - HeadTechnology - IT security trends 2014DSS ITSEC 2013 Conference 07.11.2013  - HeadTechnology - IT security trends 2014
DSS ITSEC 2013 Conference 07.11.2013 - HeadTechnology - IT security trends 2014
Andris Soroka
 
Zero-Knowledge Proofs in Light of Digital Identity
Zero-Knowledge Proofs in Light of Digital IdentityZero-Knowledge Proofs in Light of Digital Identity
Zero-Knowledge Proofs in Light of Digital Identity
Clare Nelson, CISSP, CIPP-E
 
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
Phu H. Nguyen
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
Microsoft Azure for Research
 

Similar to [DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage (20)

Zero-Knowledge Proofs: Identity Proofing and Authentication
Zero-Knowledge Proofs: Identity Proofing and AuthenticationZero-Knowledge Proofs: Identity Proofing and Authentication
Zero-Knowledge Proofs: Identity Proofing and Authentication
 
FRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHYFRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHY
 
Access Control & Encryption In Cloud Environments
Access Control & Encryption In Cloud EnvironmentsAccess Control & Encryption In Cloud Environments
Access Control & Encryption In Cloud Environments
 
Improving privacy in blockchain using homomorphic encryption
Improving privacy in blockchain using homomorphic encryption Improving privacy in blockchain using homomorphic encryption
Improving privacy in blockchain using homomorphic encryption
 
Ryan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_PresentationRyan_Holt_MS_Thesis_Project_Presentation
Ryan_Holt_MS_Thesis_Project_Presentation
 
Splunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdfSplunk September 2023 User Group PDX.pdf
Splunk September 2023 User Group PDX.pdf
 
Forecast 2012 Panel: Cloud Security Christofer Hoff
Forecast 2012 Panel: Cloud Security Christofer HoffForecast 2012 Panel: Cloud Security Christofer Hoff
Forecast 2012 Panel: Cloud Security Christofer Hoff
 
doc1.pdf
doc1.pdfdoc1.pdf
doc1.pdf
 
sheet1.pdf
sheet1.pdfsheet1.pdf
sheet1.pdf
 
lecture7.pdf
lecture7.pdflecture7.pdf
lecture7.pdf
 
paper1.pdf
paper1.pdfpaper1.pdf
paper1.pdf
 
paper8.pdf
paper8.pdfpaper8.pdf
paper8.pdf
 
Emerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudEmerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for Cloud
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
Puzzle Lock
Puzzle LockPuzzle Lock
Puzzle Lock
 
Berlin 6 Open Access Conference: Christian Zier
Berlin 6 Open Access Conference: Christian ZierBerlin 6 Open Access Conference: Christian Zier
Berlin 6 Open Access Conference: Christian Zier
 
DSS ITSEC 2013 Conference 07.11.2013 - HeadTechnology - IT security trends 2014
DSS ITSEC 2013 Conference 07.11.2013  - HeadTechnology - IT security trends 2014DSS ITSEC 2013 Conference 07.11.2013  - HeadTechnology - IT security trends 2014
DSS ITSEC 2013 Conference 07.11.2013 - HeadTechnology - IT security trends 2014
 
Zero-Knowledge Proofs in Light of Digital Identity
Zero-Knowledge Proofs in Light of Digital IdentityZero-Knowledge Proofs in Light of Digital Identity
Zero-Knowledge Proofs in Light of Digital Identity
 
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
A Multilingual, Scientific Poem on Model-Driven Security in a Vietnamese Kara...
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 

Recently uploaded (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 

[DPM 2015] PerfectDedup - Secure Data Deduplication for Cloud Storage

  • 1. PerfectDedup Secure Data Deduplication Pasquale PUZIO pasquale@secludit.com SecludIT & EURECOM Refik Molva (EURECOM) Melek Önen (EURECOM) Sergio Loureiro (SecludIT) 10th DPM International Workshop on Data Privacy Management Vienna, Austria, September 21st 2015
  • 2. Agenda • Problem Statement – Data Deduplication for Cloud Storage – Convergent Encryption • Our solution – Data Popularity – Perfect Hashing – PerfectDedup: Secure Popularity Detection – Security – Performance Evaluation 2
  • 3. Deduplication • Storing duplicate data only once • Cross-user + Client-side + Block-level 3
  • 4. Deduplication vs Encryption … but it does not work on encrypted data! D = Hello World D = Hello World ENCRYPTION with K1 ENCRYPTION with K2 owhfgr0wgr[w hfrw0[h0[ergh e0[gh0[eg dfjl;dbfrwbfirbf roepthwobgfr ugtwertgrtwu 4
  • 5. Convergent Encryption • Data Encryption key derived from Data K = hash(Data) • Deterministic & Symmetric Encryption D = Hello World D = Hello World ENCRYPTION with H(D) ENCRYPTION with H(D) klfgwilegfiorw egtriegtiergiei ergriegrigfifiw klfgwilegfiorw egtriegtiergiei ergriegrigfifiw 5 Douceur, John R., et al. "Reclaiming space from duplicate files in a serverless distributed file system." Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on. IEEE, 2002.
  • 6. Convergent Encryption MISSING INFORMATION How to achieve safe Convergent Encryption in the Cloud ? 6 Drew Perttula, Brian Warner, and Zooko Wilcox-O'Hearn, 2008-03-20 https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html
  • 7. Data Popularity • Different protection based on data-segment popularity • Popular data  Not confidential  To be deduplicated  Convergent Encryption • Unpopular data  Confidential  To be protected  Semantically-Secure Encryption 7 Stanek, Jan, et al. "A secure data deduplication scheme for cloud storage." Financial Cryptography and Data Security. Springer Berlin Heidelberg, 2014. 99-118.
  • 8. How to securely detect popularity ? CSP . . . B . . . Is block B popular ? YES / NO • Block B must not be disclosed if it is unpopular (sensitive) CLIENT 8
  • 9. PHF-based Lookup 9 ID Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. "Hash, displace, and compress." Algorithms-ESA 2009. Springer Berlin Heidelberg, 2009. 682-693.
  • 10. PerfectDedup • Based on «Secure» Perfect Hashing – One-wayness • Popular block IDs  Collision-free hash function (PHF) • BENEFITS: – Efficient (linear) generation of a new PHF (outsourced to the Cloud) – Compact representation of PHF – Very efficient (constant) evaluation on a block ID 10
  • 11. Security UNPOPULAR P POPULAR P CSP . . . . . . PHF(ID) = i i ID Block is popular 1-to-1 mapping No confidentiality issue 11
  • 12. Security UNPOPULAR P POPULAR P CSP . . . . . . PHF(ID) = i i ID’ Block is unpopular Collisions are well-distributed One-wayness property 12
  • 13. PerfectDedup CSP . . . B . . . Is block B popular ? YES / NO INDEX SERVICE If NO POPULARITY TRANSITION ? YES / NO CLIENT 13
  • 15. Performance Evaluation 0 1 2 3 4 5 6 7 8 9 10 UNPOPULAR FILE POPULARITY TRANSITION POPULAR FILE Time(inseconds) Scenario Client File Split Client Convergent Encryption Client Popularity Check Client Symmetric Encryption Idx Service Update Cloud Generate PHF Cloud Store Hash Table Cloud Popularity Check Cloud Upload Processing 15
  • 16. Conclusions • Popularity-based Deduplication • Secure Perfect Hashing • Secure & Lightweight for the client • Costly tasks outsourced to the Cloud • Low overhead 16
  • 17. Future Work • Optimization of PHF generation • Deployment in real production environments 17
  • 18. THANK YOU Questions ? Don’t be shy ! pasquale@secludit.com

Editor's Notes

  1. Hello everyone, my name’s Pasquale Puzio. I’m a PhD student at EURECOM & SecludIT under the supervision of Refik MOLVA and Sergio LOUREIRO. Today I’m gonna talk about PerfectDedup, which is our last work on secure data deduplication Data Deduplication + Confidentiality
  2. Let’s talk quickly about the agenda. Today I’ll first explain what data deduplication is and why it became interesting for researchers. Then I’ll explain how deduplication can be combined with encryption, in particular convergent encryption. This will bring me to the vulnerabilities of CE. Finally I’ll present our solution based on data popularity and perfect hashing.
  3. Basic idea: store duplicated data only once Explain Mention experiments
  4. Key and encryption are deterministic
  5. Researchers noticed that data may need different levels of protection depending on its popularity This assumption works pretty well in all common scenarios, except for a few extreme cases However in our scheme the user can skip the protocol and just encrypt his file Explain when a block becomes popular -> popularity threshold is reached Mention an example
  6. The problem is shifted to secure popularity detection: if popular do this, if unpopular do that PIR would not be efficient in the case of block-level deduplication Explain that different encryption requires the user to know if data is popular Simple solution -> look for convergent encrypted block -> not secure
  7. Let’s go into more detail Index does not reveal anything on the block because of collisions Lookup protocols use hash tables, databases use perfect hashing based indices, we need a secure lookup protocol
  8. We decided to design a new protocol based on perfect hashing Secure because we added the one-wayness property which is foundamental for the security of the protocol
  9. No confidentiality issue because block is popular
  10. On the other hand, collisions protect unpopular data Several pre-images corresponding to the same image
  11. Now let’s have a closer look at the architecture We need a trusted index service in order to handle the popularity transition, that is that phase in which a block that was unpopular becomes popular after reaching a popularity threshold Explain protocol
  12. Focus on CMPH Mention that we modified CMPH in order to make it secure (one-way)
  13. Upload of a 10MB file in three different scenarios: file was unpopular, triggered a popularity transition, was popular The take-away from this slide is that all client operations are really lightweight Example: popularity check -> outperforms PIR by far Costly operations are outsourced to the cloud Fix colors
  14. Outperforms the previous existing solutions