UCI Database Group Privacy in Database-as-a-Service(DAS) Model


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

UCI Database Group Privacy in Database-as-a-Service(DAS) Model

  1. 1. Privacy in Database-as-a-Service(DAS) Model Maithili Narasimha
  2. 2. Outline <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Motivation & Challenges for DAS model </li></ul></ul><ul><ul><li>System Components </li></ul></ul><ul><ul><ul><li>NetDB2 </li></ul></ul></ul><ul><ul><li>Protecting data from intruders </li></ul></ul><ul><ul><li>Protecting data from service providers </li></ul></ul><ul><ul><li>Future work </li></ul></ul>
  3. 3. Software As a Service <ul><li>Advantages </li></ul><ul><ul><li>reduced cost to client </li></ul></ul><ul><ul><ul><li>pay for what you use and not for hardware, software infrastructure or personnel to deploy, maintain, upgrade… </li></ul></ul></ul><ul><ul><li>reduced overall cost </li></ul></ul><ul><ul><ul><li>cost amortization across users </li></ul></ul></ul><ul><ul><li>better service </li></ul></ul><ul><ul><ul><li>leveraging experts across organizations </li></ul></ul></ul><ul><li>Driving Forces </li></ul><ul><ul><li>Faster, cheaper, more accessible networks </li></ul></ul><ul><ul><li>Virtualization in server and storage technologies </li></ul></ul><ul><ul><li>Established e-business infrastructures </li></ul></ul><ul><li>Market Players </li></ul><ul><ul><li>ERP and CRM (many examples) </li></ul></ul><ul><ul><li>More horizontal storage services, disaster recovery services, e-mail services, rent-a-spreadsheet services etc. </li></ul></ul><ul><ul><li>Sun ONE, Oracle Online Services, Microsoft .NET My Services etc </li></ul></ul>Better Service for Cheaper
  4. 4. Outline <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Motivation & Challenges for DAS model </li></ul></ul><ul><ul><li>System Components </li></ul></ul><ul><ul><ul><li>NetDB2 </li></ul></ul></ul><ul><ul><li>Protecting data from intruders </li></ul></ul><ul><ul><li>Protecting data from service providers </li></ul></ul><ul><ul><li>Future work </li></ul></ul>
  5. 5. Motivation <ul><li>If data storage can be offered as a service, why not the next higher value-added layer in data management ? </li></ul><ul><li>i.e., </li></ul><ul><li>Can we outsource our databases? </li></ul>
  6. 6. Database As a Service <ul><li>Why? </li></ul><ul><ul><li>Most organizations need DBMSs </li></ul></ul><ul><ul><li>DBMSs extremely complex to deploy, setup, maintain </li></ul></ul><ul><ul><li>require skilled DBAs with high cost </li></ul></ul><ul><li>Offerings </li></ul><ul><ul><li>Service provider allows mechanisms to create, store, access databases </li></ul></ul><ul><ul><li>DB management transferred to service provider for backup, administration, restoration, space management, upgrades </li></ul></ul><ul><ul><li>Clients use the service providers HW, SW, personnel instead of hiring their own </li></ul></ul>BUT….
  7. 7. DAS System Components Warm Standby User (Web Browser) User (Web Browser) Service provider Client Create and load data, develop and install applications Access data, catalogs, information HTTP Server Database (client Data) Backup/Recovery Standby System Servlet Engine User (Web Browser) User (Web Browser)
  8. 8. NetDB2 Service <ul><li>Developed by the UCI Database Group in collaboration with IBM </li></ul><ul><li>Deployed on the Internet over a year ago </li></ul><ul><ul><li>Been used by 15 universities and more than 2500 students to learn database classes </li></ul></ul>4 2 3 1
  9. 9. Challenges <ul><li>Economic/business model? </li></ul><ul><ul><li>How to charge for service, what kind of service guarantees can be offered, costing of guarantees, liability of service provider. </li></ul></ul><ul><li>Powerful interfaces to support complete application development environment </li></ul><ul><ul><li>User Interface for SQL, support for embedded SQL programming, support for user defined interfaces, etc. </li></ul></ul><ul><li>Scalability in the web environment </li></ul><ul><ul><li>overheads due to network latency (data proxies?) </li></ul></ul><ul><li>Privacy and Security </li></ul><ul><ul><li>Protecting data at service providers from intruders and attacks. </li></ul></ul><ul><ul><li>Protecting clients from misuse of data by service providers </li></ul></ul>
  10. 10. Outline <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Motivation & Challenges for DAS model </li></ul></ul><ul><ul><li>System Components </li></ul></ul><ul><ul><ul><li>NetDB2 </li></ul></ul></ul><ul><ul><li>Protecting data from intruders </li></ul></ul><ul><ul><li>Protecting data from service providers </li></ul></ul><ul><ul><li>Future work </li></ul></ul>
  11. 11. Protecting Data From Intruders (ICDE 2002) <ul><li>Approach </li></ul><ul><ul><li>data stored at service provider in an encrypted form </li></ul></ul><ul><li>Issues and Challenges </li></ul><ul><ul><li>Key generation and management </li></ul></ul><ul><ul><ul><li>who generates and stores keys </li></ul></ul></ul><ul><ul><li>Granularity of encryption </li></ul></ul><ul><ul><ul><li>Attribute, row, page, table </li></ul></ul></ul><ul><ul><li>Implementation </li></ul></ul><ul><ul><ul><li>Encryption mechanisms </li></ul></ul></ul><ul><ul><li>Query Processing and Optimization </li></ul></ul><ul><ul><ul><li>optimal implementation of relational operators </li></ul></ul></ul>
  12. 12. Outline <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Motivation & Challenges for DAS model </li></ul></ul><ul><ul><li>System Components </li></ul></ul><ul><ul><ul><li>NetDB2 </li></ul></ul></ul><ul><ul><li>Protecting data from intruders </li></ul></ul><ul><ul><li>Protecting data from service providers </li></ul></ul><ul><ul><li>Future work </li></ul></ul>
  13. 13. Protecting Data from Service Provider <ul><li>Motivation </li></ul><ul><ul><li>total data privacy </li></ul></ul><ul><li>Naïve approach </li></ul><ul><ul><li>Store encrypted database with the service provider </li></ul></ul><ul><ul><li>Transmit the requisite encrypted tables from the server to the client </li></ul></ul><ul><ul><li>Decrypt the tables and execute the query at the client </li></ul></ul>Almost all the advantages of the DAS model are lost 
  14. 14. The real challenge… <ul><li>How can the service provider execute a query without decrypting the data? </li></ul>
  15. 15. Protecting Data from Service Provider (SIGMOD 2002) <ul><li>Approach: </li></ul><ul><li>Server hosted by the service provider stores encrypted database </li></ul><ul><li>The encrypted database is augmented with additional information (aka index ) </li></ul><ul><ul><li>this allows certain amount of query processing to occur at the server </li></ul></ul><ul><li>Client maintains metadata </li></ul><ul><li>Strategy: </li></ul><ul><li>Split the original query into </li></ul><ul><ul><ul><li>A corresponding query over encrypted relations to run on the server </li></ul></ul></ul><ul><ul><ul><li>A client query for post processing the results of the server query </li></ul></ul></ul>
  16. 16. Protecting Data from Service Provider (SIGMOD 2002) <ul><li>Approach: </li></ul><ul><ul><li>Query split into server side (Qs) and client side (Qc) </li></ul></ul><ul><ul><li>Qs executes at service provider on encrypted data </li></ul></ul><ul><ul><li>Qc executes on client after decrypting </li></ul></ul>Temporary results Query translator Meta data User (Web Browser ) Query executor Encrypted data Query executor client service provider results Original query Query over encrypted data Encrypted results
  17. 17. Protecting Data from Service Provider (SIGMOD 2002) <ul><li>For a relation R(A 1 , A 2 ,…, A n ), R s (etuple, A 1 , A 2 ,…, A n ) is stored at the server </li></ul><ul><ul><li>etuple is the encrypted string that corresponds to a tuple in relation R </li></ul></ul><ul><ul><li>each A s corresponds to the index for the attribute A </li></ul></ul>s s s
  18. 18. Partition Function & Identification Function <ul><li>Map the domain of values of an attribute into partitions s.t. </li></ul><ul><ul><li>these partitions taken together cover the whole domain & </li></ul></ul><ul><ul><li>any two partitions do not overlap </li></ul></ul><ul><li> Split the domain into a set of buckets </li></ul><ul><li>partition(R.A i ) = {p 1 , p 2 , …, p k } </li></ul><ul><li>Assign an identifier to each bucket </li></ul><ul><ul><ul><ul><ul><li>ident R.A i (p j ) for each partition p j of attribute A i </li></ul></ul></ul></ul></ul>
  19. 19. Mapping Function <ul><li>Map a value v in the domain of the attribute A to the identifier of the partition to which v belong </li></ul><ul><li>map R.A i ( v ) = ident R.A i (p j ) v is in p j </li></ul><ul><li>Types : </li></ul><ul><ul><li>Order preserving </li></ul></ul><ul><ul><ul><li>For any two values v i and v j , if v i < v j then </li></ul></ul></ul><ul><ul><ul><li> map R.A i ( v i ) < map R.A i ( v j ) </li></ul></ul></ul><ul><ul><li>Random </li></ul></ul><ul><ul><ul><li>Mapping is not order preserving </li></ul></ul></ul><ul><li>Mapping function type affects query translation! </li></ul><ul><ul><li>Order preserving mapping lends itself to easier query translation </li></ul></ul><ul><ul><li>However, random ordering is more secure </li></ul></ul>
  20. 20. Mapping Conditions for Query translation <ul><li>Attribute = value </li></ul><ul><ul><ul><li>A i = v  A i = map A i ( v ) </li></ul></ul></ul><ul><li>Attribute < Value </li></ul><ul><ul><li>order preserving </li></ul></ul><ul><ul><ul><li>A i < v  A i ≤ map A i ( v ) </li></ul></ul></ul><ul><ul><li>random </li></ul></ul><ul><ul><ul><li>Translation is more complex. Need to check if the attribute value representation A i lies in any of the partitions that may contain a value v ’ where v ’ < v </li></ul></ul></ul><ul><li>Attribute 1 = Attribute 2 (Join queries) </li></ul><ul><ul><ul><li>A i = A j  (A i = ident A i (p k ) ) Λ (A j = ident A j (p l )) </li></ul></ul></ul><ul><ul><ul><li>for all p k  partition(A i ) , p l  partition(A j ) , p k intersection p l ≠ Ø </li></ul></ul></ul><ul><li>And so on … </li></ul>s s s s s Post processing (filtering the results) is necessary!!
  21. 21. Some issues <ul><li>Buckets (Equi-width vs. equi-depth): </li></ul><ul><ul><li># of buckets and </li></ul></ul><ul><ul><li># of elements in buckets </li></ul></ul><ul><li>Various overheads: </li></ul><ul><ul><li>Metadata at client (fewer buckets  lesser metadata) </li></ul></ul><ul><ul><li>Amount of filtering (fewer buckets  more filtering) </li></ul></ul><ul><ul><li>Bandwidth consumed and </li></ul></ul><ul><ul><li>Storage wasted </li></ul></ul>How are Security and Performance affected by these choices?
  22. 22. What next? <ul><li>SIGMOD 2002 could execute SQL queries involving SELECT, JOIN, UNION, GROUP BY … </li></ul><ul><ul><li>equality and logical comparison predicate clauses </li></ul></ul>What about “aggregation” queries??
  23. 23. Aggregation queries <ul><li>A large fraction of queries require data aggregation </li></ul><ul><ul><li> Arithmetic operations (sum, count, average etc.) on encrypted data! </li></ul></ul><ul><li>Traditional symmetric encryption schemes are not useful </li></ul>
  24. 24. One possible solution – Privacy Homomorphisms <ul><li>PH Overview: </li></ul><ul><ul><li>A is the domain of unencrypted values, ε k is an encryption function using key k and D k is the corresponding decryption function. </li></ul></ul><ul><ul><li>Let A = {α 1 , α 2 , . . ., α n } and Β = {β 1 , β 2 , . . ., β n } be two function families. </li></ul></ul><ul><ul><li>( ε k , D k , A , Β) is defined as a PH if </li></ul></ul><ul><ul><li>D k (β i ( ε k (a 1 ), ε k (a 2 ), . . ., ε k (a m ))) = α i (a 1 , a 2 ,… a m ) </li></ul></ul><ul><ul><li>for all i, 0 ≤ i ≤ 1 </li></ul></ul>
  25. 25. PH by Rivest et al. <ul><ul><li>Setup </li></ul></ul><ul><ul><ul><li>n = pq </li></ul></ul></ul><ul><ul><li>Encryption </li></ul></ul><ul><ul><ul><li>ε k (a) = (a mod p, a mod q) where a  Z n </li></ul></ul></ul><ul><ul><li>Decryption </li></ul></ul><ul><ul><ul><li>D k (a) = d 1 qq -1 + d 2 pp -1 (mod n) </li></ul></ul></ul><ul><ul><ul><li>(d 1 = a mod p & d 2 = a mod q) </li></ul></ul></ul><ul><ul><li>Proof of correctness based on CRT </li></ul></ul><ul><ul><li>PH works for modular addition, subtraction and multiplication </li></ul></ul>
  26. 26. A small example… <ul><li>p = 5, q = 7 (n = 35) </li></ul><ul><li>a1 = 5 & a2 = 6 </li></ul><ul><li>ε(a1) = ( 0, 5) ε(a2) = (1, 6) are stored on the server </li></ul><ul><li>Compute (a1 + a2) </li></ul><ul><li>Server computes ε(a1) + ε(a2) componentwise </li></ul><ul><ul><li>(0+1, 5+6) = (1, 11) </li></ul></ul><ul><li>Client decrypts (1, 11) as d 1 qq -1 + d 2 pp -1 (mod n) </li></ul><ul><ul><li>(1.7.3 + 11.5.3) (mod 35) = 186 mod 35 = 11 </li></ul></ul>
  27. 27. Outline <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Motivation & Challenges for DAS model </li></ul></ul><ul><ul><li>System Components </li></ul></ul><ul><ul><ul><li>NetDB2 </li></ul></ul></ul><ul><ul><li>Protecting data from intruders </li></ul></ul><ul><ul><li>Protecting data from service providers </li></ul></ul><ul><ul><li>Future work </li></ul></ul>
  28. 28. Future work <ul><li>Other homomorphic encryption schemes? </li></ul><ul><ul><ul><li>Paillier’s cryptosystem based on composite degree residuosity </li></ul></ul></ul><ul><ul><ul><li>Benaloh’s cryptosystem based on prime residuosity </li></ul></ul></ul><ul><ul><ul><li>scheme based on DLP </li></ul></ul></ul><ul><ul><li>Performance vs. Security offered </li></ul></ul><ul><li>These schemes need to be efficiently extended to other data-types (e.g., floats) as well as arbitrary sequence of arithmetic operations </li></ul><ul><ul><ul><li>(e.g., SUM(A1+A2*100)) </li></ul></ul></ul><ul><li>What if an aggregation query has an associated selection clause? How to execute queries with complex selection conditions? </li></ul><ul><ul><li>E.g., SUM salary where department_id > 35 and department_id < 45 </li></ul></ul>
  29. 29. References <ul><li>Hakan Hacigumus, Bala Iyer, Chen Li, and Sharad Mehrotra &quot;Executing SQL over Encrypted Data in the Database-Service-Provider Model&quot;, 2002 ACM SIGMOD Conference on Management of Data, Jun, 2002. </li></ul><ul><li>Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra &quot;Providing Database as a Service&quot;, 2002 IEEE International Conference on Data Engineering (ICDE), Feb., 2002. </li></ul><ul><li>Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra &quot;Efficient Execution of Aggregation Queries over Encrypted Relational Databases&quot; </li></ul>
  30. 30. <ul><li>Thank You! </li></ul>Thank You!