Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

680 views

Published on

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to an attacker who has compromised the operating system or hypervisor. Trusted hardware such as Intel SGX has recently become available in latest-generation processors. Such hardware enables arbitrary computation on encrypted data while shielding it from a malicious OS or hypervisor. However, it still suffers from a significant side channel: access pattern leakage.

We present Opaque, a package for Apache Spark SQL that enables very strong security for SQL queries: data encryption, computation verification, and access pattern leakage protection (a.k.a. obliviousness). Opaque achieves these guarantees by introducing new oblivious distributed relational operators that provide 2000x performance gain over state of the art oblivious systems, as well as novel query planning techniques for these operators implemented using Catalyst.

Published in: Data & Analytics
  • Be the first to comment

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

  1. 1. Opaque: A Secure Distributed Data Analytics Framework Wenting Zheng, Ankur Dave, Jethro Beekman, Raluca Ada Popa, Joseph Gonzalez, Ion Stoica
  2. 2. Complex analytics run on sensitive data client cloud provider sensitive data
  3. 3. Complex analytics run on sensitive data client Spark SQL MLLib GraphX Spark Streaming cloud provider sensitive data
  4. 4. Complex analytics run on sensitive data client Spark SQL MLLib GraphX Spark Streaming cloud provider sensitive data
  5. 5. Threat model client cloud provider sensitive data
  6. 6. Threat model client cloud provider sensitive data
  7. 7. Threat model client cloud provider sensitive data
  8. 8. Threat model client cloud provider sensitive data
  9. 9. Challenge: protect data and preserve functionality
  10. 10. Spark SQL Opaque*: secure data analytics * Oblivious Platform for Analytic QUEries SQL Machine Learning Graph Analytics
  11. 11. Spark SQL Opaque Opaque*: secure data analytics * Oblivious Platform for Analytic QUEries SQL Machine Learning Graph Analytics
  12. 12. Prior work
  13. 13. Prior work • Computation on encrypted data
  14. 14. Prior work • Computation on encrypted data – A cryptographic approach using homomorphic encryption
  15. 15. Prior work • Computation on encrypted data – A cryptographic approach using homomorphic encryption – Either impractically slow (FHE), or limited functionality (CryptDB)
  16. 16. Prior work • Computation on encrypted data – A cryptographic approach using homomorphic encryption – Either impractically slow (FHE), or limited functionality (CryptDB) • Hardware-based systems
  17. 17. Prior work • Computation on encrypted data – A cryptographic approach using homomorphic encryption – Either impractically slow (FHE), or limited functionality (CryptDB) • Hardware-based systems – Use trusted hardware
  18. 18. Prior work • Computation on encrypted data – A cryptographic approach using homomorphic encryption – Either impractically slow (FHE), or limited functionality (CryptDB) • Hardware-based systems – Use trusted hardware – Only single machine computation (Haven), or weaker security guarantees (VC3)
  19. 19. Prior work • Computation on encrypted data – A cryptographic approach using homomorphic encryption – Either impractically slow (FHE), or limited functionality (CryptDB) • Hardware-based systems – Use trusted hardware – Only single machine computation (Haven), or weaker security guarantees (VC3) Opaque utilizes trusted hardware
  20. 20. Hardware enclaves
  21. 21. Hardware enclaves • Hardware-protected containers in presence of malicious OS
  22. 22. Hardware enclaves • Hardware-protected containers in presence of malicious OS
  23. 23. Hardware enclaves • Hardware-protected containers in presence of malicious OS
  24. 24. Hardware enclaves • Hardware-protected containers in presence of malicious OS Untrusted OS
  25. 25. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS Untrusted OS
  26. 26. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS • Shielded execution Untrusted OS
  27. 27. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS • Shielded execution Untrusted OS Secret data
  28. 28. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS • Shielded execution Untrusted OS Secret data Code
  29. 29. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS • Shielded execution • Encrypted enclave memory Untrusted OS Secret data Code
  30. 30. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS • Shielded execution • Encrypted enclave memory • Software attestation Untrusted OS Secret data Code
  31. 31. Enclave Hardware enclaves • Hardware-protected containers in presence of malicious OS • Shielded execution • Encrypted enclave memory • Software attestation • Example: Intel SGX, AMD memory encryption Untrusted OS Secret data Code
  32. 32. System initialization Database Client Server
  33. 33. System initialization • Remote attestation to load Opaque code Database Client Server
  34. 34. System initialization • Remote attestation to load Opaque code Opaque SQL operators Database Client Server
  35. 35. System initialization • Remote attestation to load Opaque code Opaque SQL operators Database Client Server
  36. 36. System initialization • Remote attestation to load Opaque code Opaque SQL operators hash Database Client Server
  37. 37. System initialization • Remote attestation to load Opaque code Opaque SQL operators hash Database Client Server
  38. 38. System initialization • Remote attestation to load Opaque code Opaque SQL operators hash Database Client Server
  39. 39. System initialization • Remote attestation to load Opaque code Database Client Server
  40. 40. System initialization • Remote attestation to load Opaque code • Key exchange protocol Database Client Server
  41. 41. System initialization • Remote attestation to load Opaque code • Key exchange protocol Database Client Server
  42. 42. System initialization • Remote attestation to load Opaque code • Key exchange protocol Database Client Server
  43. 43. System initialization • Remote attestation to load Opaque code • Key exchange protocol Database Client Server
  44. 44. System initialization • Remote attestation to load Opaque code • Key exchange protocol Database Client Server
  45. 45. System initialization • Remote attestation to load Opaque code • Key exchange protocol Database Client Server
  46. 46. System initialization • Remote attestation to load Opaque code • Key exchange protocol • This is NOT on a per-query basis Database Client Server
  47. 47. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3
  48. 48. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3query = SELECT sum(*) FROM table
  49. 49. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3 Query query = SELECT sum(*) FROM table
  50. 50. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3 Query query = SELECT sum(*) FROM table
  51. 51. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3query = SELECT sum(*) FROM table
  52. 52. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3query = SELECT sum(*) FROM table
  53. 53. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3query = SELECT sum(*) FROM table
  54. 54. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3query = SELECT sum(*) FROM table
  55. 55. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler 1 2 3 query = SELECT sum(*) FROM table
  56. 56. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler query = SELECT sum(*) FROM table 10 13 4
  57. 57. Spark Driver Opaque Catalyst Query execution Client Server Database Scheduler query = SELECT sum(*) FROM table 10 13 4
  58. 58. Spark Driver Opaque Catalyst 27 Query execution Client Server Database Scheduler query = SELECT sum(*) FROM table
  59. 59. Spark Driver Opaque Catalyst 27 Query execution Client Server Database Scheduler query = SELECT sum(*) FROM table
  60. 60. Problem: cloud can alter distributed computation
  61. 61. Problem: cloud can alter distributed computation • Drop data
  62. 62. Problem: cloud can alter distributed computation • Drop data • Modify data
  63. 63. Problem: cloud can alter distributed computation • Drop data • Modify data • Skip task
  64. 64. Problem: cloud can alter distributed computation • Drop data • Modify data • Skip task • Replay old state
  65. 65. Example: drop data Spark Driver Opaque Catalyst Server Database Scheduler 1 2 3 Client query = SELECT sum(*) FROM table
  66. 66. Example: drop data Spark Driver Opaque Catalyst Server Database Scheduler 1 2 3 Client query = SELECT sum(*) FROM table
  67. 67. Example: drop data Spark Driver Opaque Catalyst Server Database Scheduler 10 13 4 Client query = SELECT sum(*) FROM table
  68. 68. Example: drop data Spark Driver Opaque Catalyst Server Database Scheduler 10 13 Client query = SELECT sum(*) FROM table
  69. 69. Example: drop data Spark Driver Opaque Catalyst Server Database Scheduler 23 Client query = SELECT sum(*) FROM table
  70. 70. Example: drop data Spark Driver Opaque Catalyst Server Database Scheduler 23 Client query = SELECT sum(*) FROM table
  71. 71. Self-verifying computation
  72. 72. Self-verifying computation Invariant: if computation does not abort, the execution completed so far is correct
  73. 73. Self-verifying computation Invariant: if computation does not abort, the execution completed so far is correct
  74. 74. Self-verifying computation Invariant: if computation does not abort, the execution completed so far is correct If the computation is complete, then the entire query was executed correctly
  75. 75. Self-verifying computation Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  76. 76. Self-verifying computation 20 1413 15 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  77. 77. Self-verifying computation 20 1413 15 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  78. 78. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  79. 79. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  80. 80. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  81. 81. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  82. 82. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  83. 83. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  84. 84. Self-verifying computation 20 1413 15 10 13 4 Task 13 Task 14 Task 15 Task 20 query = SELECT sum(*) FROM table
  85. 85. ID Name Age Disease 12809 Amanda D. Edwards 40 Diabetes 29489 Robert R. McGowan 56 Diabetes 13744 Kimberly R. Seay 51 Cancer 18740 Dennis G. Bates 32 Diabetes 98329 Ronald S. Ogden 53 Cancer medical table: Problem: access pattern leakage 32591 Donna R. Bridges 26 Diabetes
  86. 86. ID Name Age Disease SELECT count(*) FROM medical 
 GROUP BY disease 12809 Amanda D. Edwards 40 Diabetes 29489 Robert R. McGowan 56 Diabetes 13744 Kimberly R. Seay 51 Cancer 18740 Dennis G. Bates 32 Diabetes 98329 Ronald S. Ogden 53 Cancer medical table: Problem: access pattern leakage 32591 Donna R. Bridges 26 Diabetes
  87. 87. Problem: access pattern leakage SELECT count(*) FROM medical 
 GROUP BY disease 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes
  88. 88. Problem: access pattern leakage SELECT count(*) FROM medical 
 GROUP BY disease 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes
  89. 89. Problem: access pattern leakage SELECT count(*) FROM medical 
 GROUP BY disease 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes
  90. 90. Problem: access pattern leakage 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Attack viable for both memory and network access patterns!
  91. 91. Oblivious mode
  92. 92. Oblivious mode Oblivious primitives
  93. 93. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives
  94. 94. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators
  95. 95. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators project- filter low-cardinality agg. sort- merge join broadcast join …
  96. 96. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators Oblivious operators project- filter low-cardinality agg. sort- merge join broadcast join …
  97. 97. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators Oblivious operators project- filter low-cardinality agg. sort- merge join broadcast join … Oblivious Filter Oblivious Sort Oblivious Aggregation Oblivious Join
  98. 98. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators Oblivious operators project- filter low-cardinality agg. sort- merge join broadcast join … Oblivious Filter Oblivious Sort Oblivious Aggregation Oblivious Join Oblivious Query Plan
  99. 99. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators Oblivious operators project- filter low-cardinality agg. sort- merge join broadcast join … Oblivious Filter Oblivious Sort Oblivious Aggregation Oblivious Join Oblivious Query Plan {
  100. 100. Oblivious mode intra-machine o-sort inter-machine o-sort random permutation Oblivious primitives Opaque operators Oblivious operators project- filter low-cardinality agg. sort- merge join broadcast join … Oblivious Filter Oblivious Sort Oblivious Aggregation Oblivious Join Oblivious Query Plan {Query optimization
  101. 101. Oblivious aggregation 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Oblivious sort SELECT count(*) FROM medical GROUP BY disease Map Sort
  102. 102. Oblivious aggregation 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Oblivious sort SELECT count(*) FROM medical GROUP BY disease Map Sort
  103. 103. Oblivious aggregation 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Oblivious sort SELECT count(*) FROM medical GROUP BY disease Map Sort
  104. 104. Map Sort Oblivious sort 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  105. 105. Sort Oblivious sort 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  106. 106. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  107. 107. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Statistics Statistics Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  108. 108. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Statistics Statistics Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  109. 109. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Statistics Statistics Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  110. 110. 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  111. 111. 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Result size Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  112. 112. 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  113. 113. 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Offset Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  114. 114. 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  115. 115. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Scan Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  116. 116. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes 2; 1 2; 2 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Scan Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  117. 117. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Scan Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  118. 118. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Scan Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  119. 119. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Scan Cancer:2 Diabetes:3 Diabetes:1 DUMMY Final result Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  120. 120. 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes 12809 … Diabetes 29489 … Diabetes 13744 … Cancer 18740 … Diabetes 98329 … Cancer 32591 … Diabetes Scan Boundary processing Scan Final result Cancer:2 Diabetes:4 Oblivious aggregation SELECT count(*) FROM medical GROUP BY disease
  121. 121. Opaque modes
  122. 122. Opaque modes • Encryption mode
  123. 123. Opaque modes • Encryption mode – Data encryption and authentication
  124. 124. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected
  125. 125. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected
  126. 126. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected Snapshot attacker e.g. external hacker
  127. 127. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected • Oblivious mode Snapshot attacker e.g. external hacker
  128. 128. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected • Oblivious mode – Additionally hide access patterns Snapshot attacker e.g. external hacker
  129. 129. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected • Oblivious mode – Additionally hide access patterns Snapshot attacker e.g. external hacker
  130. 130. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected • Oblivious mode – Additionally hide access patterns Snapshot attacker e.g. external hacker Persistent attacker e.g. insider
  131. 131. Opaque modes • Encryption mode – Data encryption and authentication – Computation is integrity protected • Oblivious mode – Additionally hide access patterns Snapshot attacker e.g. external hacker Persistent attacker e.g. insider Trade off: performance and security
  132. 132. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  133. 133. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  134. 134. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  135. 135. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  136. 136. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  137. 137. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  138. 138. Project-filter Obliv. sort Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  139. 139. Project-filter Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical
  140. 140. Project-filter Filter Query optimization - oblivious SELECT count(*) 
 FROM medical
 WHERE age > 30 
 GROUP BY disease Low-card. obliv. agg. Scan Obliv. sort Aggregate medical Reduced # of oblivious sorts by 1
  141. 141. D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) Query optimization - mixed sensitivity
  142. 142. D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) Query optimization - mixed sensitivity
  143. 143. D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) Query optimization - mixed sensitivity
  144. 144. Query optimization - mixed sensitivity
  145. 145. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_)
  146. 146. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id
  147. 147. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id |P| < |D| < |M|
  148. 148. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id Patient Disease ⨝ Medication ⨝ ᵞ |P| < |D| < |M|
  149. 149. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id Patient Disease ⨝ Medication ⨝ ᵞ SQL join order |P| < |D| < |M|
  150. 150. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id Patient Disease ⨝ Medication ⨝ ᵞ SQL join order |P| < |D| < |M|
  151. 151. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id Patient Disease ⨝ Medication ⨝ ᵞ Patient Disease ⨝ Medication ⨝ ᵞ SQL join order |P| < |D| < |M|
  152. 152. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id Patient Disease ⨝ Medication ⨝ ᵞ Patient Disease ⨝ Medication ⨝ ᵞ SQL join order Opaque join order |P| < |D| < |M|
  153. 153. Query optimization - mixed sensitivity D_ID AGE NAME P_ID END_DATE START_DATE PID COMMENT DOCTOR T_ID DOSAGE END_TIME START_TIME M_ID T_ID COMMENT DATE TR_ID G_ID NAME D_ID COST D_ID NAME M_ID COMMENT NAME G_ID Patient (P_) Treatment Plan (TP_) Treatment Record (TR_) Disease (D_) Medication (M_) Gene (G_) SELECT p_name, d_name, med_cost FROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS med WHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id Patient Disease ⨝ Medication ⨝ ᵞ Patient Disease ⨝ Medication ⨝ ᵞ SQL join order Opaque join order |P| < |D| < |M|
  154. 154. Evaluation setup
  155. 155. Evaluation setup • Single machine experiments – Intel Xeon E3-1280 v5, 4 cores, 64 GB RAM – Intel SGX: 128 MB of enclave page cache (EPC) – Hardware mode
  156. 156. Evaluation setup • Single machine experiments – Intel Xeon E3-1280 v5, 4 cores, 64 GB RAM – Intel SGX: 128 MB of enclave page cache (EPC) – Hardware mode • Distributed experiments – EC2: five r3.2xlarge instances, 8 cores, 61 GB RAM – Simulation mode only
  157. 157. Evaluation
  158. 158. Evaluation • How does Opaque compare to Spark SQL?
  159. 159. Evaluation • How does Opaque compare to Spark SQL? – Big Data Benchmark (BDB)
  160. 160. Evaluation • How does Opaque compare to Spark SQL? – Big Data Benchmark (BDB) • Queries 1, 2, 3: filter, aggregation, join
  161. 161. Evaluation • How does Opaque compare to Spark SQL? – Big Data Benchmark (BDB) • Queries 1, 2, 3: filter, aggregation, join • 1 million records
  162. 162. Evaluation • How does Opaque compare to Spark SQL? – Big Data Benchmark (BDB) • Queries 1, 2, 3: filter, aggregation, join • 1 million records • How does Opaque compare to state-of-the-art oblivious systems?
  163. 163. Evaluation • How does Opaque compare to Spark SQL? – Big Data Benchmark (BDB) • Queries 1, 2, 3: filter, aggregation, join • 1 million records • How does Opaque compare to state-of-the-art oblivious systems? – GraphSC (graph analytics)
  164. 164. Evaluation • How does Opaque compare to Spark SQL? – Big Data Benchmark (BDB) • Queries 1, 2, 3: filter, aggregation, join • 1 million records • How does Opaque compare to state-of-the-art oblivious systems? – GraphSC (graph analytics) • PageRank
  165. 165. Big Data Benchmark (encryption mode) Single machine
  166. 166. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Single machine
  167. 167. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Single machine
  168. 168. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Single machine
  169. 169. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque DistributedSingle machine
  170. 170. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque DistributedSingle machine
  171. 171. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque DistributedSingle machine
  172. 172. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque DistributedSingle machine
  173. 173. Big Data Benchmark (encryption mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Distributed With very little cost, you will have data encryption, authentication and computation protection! Single machine
  174. 174. Single machine Distributed Big Data Benchmark (oblivious mode)
  175. 175. Single machine Distributed Big Data Benchmark (oblivious mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque
  176. 176. Single machine Distributed Big Data Benchmark (oblivious mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque
  177. 177. Single machine Distributed Big Data Benchmark (oblivious mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque
  178. 178. Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Single machine Distributed Big Data Benchmark (oblivious mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque
  179. 179. Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Single machine Distributed Big Data Benchmark (oblivious mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque
  180. 180. Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque Single machine Distributed Big Data Benchmark (oblivious mode)Runtime(s) 0.01 0.1 1 10 100 Query number Query 1 Query 2 Query 3 Spark SQL Opaque
  181. 181. PageRank: comparison with GraphSC
  182. 182. We have an NSDI 2017 paper!
  183. 183. Open source release
  184. 184. Open source release • Available at github.com/ucbrise/opaque
  185. 185. Open source release • Available at github.com/ucbrise/opaque • Opaque is implemented as a Spark package
  186. 186. Open source release • Available at github.com/ucbrise/opaque • Opaque is implemented as a Spark package • Features
  187. 187. Open source release • Available at github.com/ucbrise/opaque • Opaque is implemented as a Spark package • Features – Supports DataFrame select, filter, group by, join
  188. 188. Open source release • Available at github.com/ucbrise/opaque • Opaque is implemented as a Spark package • Features – Supports DataFrame select, filter, group by, join – Allows users to specify DataFrames in encryption/ oblivious modes
  189. 189. Open source release • Available at github.com/ucbrise/opaque • Opaque is implemented as a Spark package • Features – Supports DataFrame select, filter, group by, join – Allows users to specify DataFrames in encryption/ oblivious modes • Automatic sensitivity propagation in mixed sensitivity
  190. 190. Open source release
  191. 191. Open source release • Extension
  192. 192. Open source release • Extension – More functionality requires rewriting operators in C++
  193. 193. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet
  194. 194. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions
  195. 195. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions • Automatically generate C++
  196. 196. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions • Automatically generate C++ • Run JVM in the enclave
  197. 197. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions • Automatically generate C++ • Run JVM in the enclave • Deployment
  198. 198. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions • Automatically generate C++ • Run JVM in the enclave • Deployment – Master must be trusted
  199. 199. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions • Automatically generate C++ • Run JVM in the enclave • Deployment – Master must be trusted – SGX available now on Skylake processors
  200. 200. Open source release • Extension – More functionality requires rewriting operators in C++ – No UDF support yet – Possible solutions • Automatically generate C++ • Run JVM in the enclave • Deployment – Master must be trusted – SGX available now on Skylake processors • Cloud providers have no support yet
  201. 201. Demo
  202. 202. Conclusion Opaque is a secure distributed analytics platform Opaque SQL Machine Learning Graph Analytics Try it out at github.com/ucbrise/opaque Wenting Zheng - wzheng@eecs.berkeley.edu

×