Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Do Le Quoc, Franz Gregor, Jatinder Singh and Christof Fetzer
SGX-PySpark: Secure Distributed Data Analytics
Motivation
• D...
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

WWW19: SGX-PySpark: Secure Distributed Data Analytics

Download to read offline

SGX-PySpark: Secure Distributed Data Analytics addresses how public cloud users can protect sensitive data while still preserving the same utility of data analytics.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

WWW19: SGX-PySpark: Secure Distributed Data Analytics

  1. 1. Do Le Quoc, Franz Gregor, Jatinder Singh and Christof Fetzer SGX-PySpark: Secure Distributed Data Analytics Motivation • Data analytics has become an important component of modern cloud-based data-driven services • Large-scale datasets processed by the service may contain customer's sensitive information • Customers need to trust both service providers and cloud providers • How to protect sensitive data while preserving the same utility of data analytics ? Key idea • Ensure confidentiality and integrity for both code and data using trusted hardware, i.e., Intel Software Guard Extensions (SGX) • Execute only sensitive parts of data analytics inside enclaves • Encrypt input data; decrypt and securely process it inside enclaves Implementation • PySpark: widely used in industry for big data analytics • SCONE: enables unmodified applications run inside Intel SGX enclaves • Execute Spark Driver and Python processes of PySpark inside enclaves using SCONE SGX-PySpark • Objectives: • Support complex operations for big data analytics • Provide strong security guarantees • Minimize performance overhead • Support Python • Architecture: Evaluation • Dataset: TPC-H Benchmark • ~22 % overhead compared to native execution Demo • GitHub repository: https://github.com/doflink/sgx-pyspark-demo • Demo video: https://youtu.be/yI3iEFWUWbU 0 20 40 60 80 100 Q1 Q3 Q4 Q5 Q6 Q7 Q10 Q12 Q13 Q14 Q16 Q18 Q19 Latency[seconds] TPC-H Queries SGX-PySpark Native PySpark

SGX-PySpark: Secure Distributed Data Analytics addresses how public cloud users can protect sensitive data while still preserving the same utility of data analytics.

Views

Total views

289

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

3

Shares

0

Comments

0

Likes

0

×