Privacy preserving queries on encrypted data


Published on

Database Encryption

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Privacy preserving queries on encrypted data

  1. 1. Zhiqian Yang - Sheng Zong - Rebecca N. Wright -Rohit Ainapure
  2. 2. SQL Tables referenced duringpresentation• NURSERY [1] • NURSERY has eight input attributes: • parents, has_nurs, form, children, housing, finance, social, health • The Authors have added ID, so that the table has a Primary Key.These are the values the attributes can take:• ID: 1,2,3, more• parents: usual, pretentious, great_pret• has_nurs: proper, less_proper, improper, critical, very_crit• form: complete, completed, incomplete, foster• children: 1, 2, 3, more• housing: convenient, less_conv, critical• finance: convenient, inconv• social: non-prob, slightly_prob, problematic• health: recommended, priority, not_recom• TECHNICAL_PAPER AS paper • ID, intro, key_aspects, tna_model, table_structure, encryption_method
  3. 3. SELECT key_aspects FROM paper• Data Confidentiality• Encryption• Performance of Database (Queries)• Trust and Attack Model• Basic Solution• Enhanced Solution• High Performance Solution• Benchmark (Minimal Information Revelation)• Performance of the system
  4. 4. CREATE reVIEW intro ASSELECT introduction FROM paper• Security breaches in 2005 – 06 brought to light that the existing techniques could not ensure that a database is fully immune to intrusion & unauthorized access. [2]• Encryption for protection of sensitive data in databases. [3]• Simpler and obvious solution : – if database server can be fully trusted, send the encryption key together with the query.• Impractical Solution : – retrieve all encrypted tables from the database, decrypt the tables, and then perform queries on the cleartext.• There were many others[4][5][6][7][8] – but this one was the closest : - research titled “Search on encrypted data” [9] – finding keywords in existing files.
  5. 5. From the key_aspects Query• Basic Solution • All data is stored and processed in the “SR/sX7uzuUk=“ed format. • It returns to the user precisely the encryptions in the database that match the query - R(Q) {set of coordinates of the cells satisfying the condition of the query Q. • Trade off privacy for efficiency. • Enhanced Solution • For a broad class of tables this solution reveals nothing but the Minimum Information Revelation. • Performance Efficient Solution • Solution adds metadata to further speed up queries
  6. 6. INSERT INTO paper(tna_model)VALUES(‘Trust_and_attack_Model.gif’,empty_blob())
  7. 7. more ….The model assumes that an intruder who has complete access to thedatabase server for some time should learn very little about the datastored in the database.The model is based on the following:•Database server is vulnerable to intrusion.•A reasonable assumption for an Intruder – once he has access to thedatabase, he can not only observe the encrypted data but also controlthe whole database system – for a time interval.•Assume the communication channel between the user and thedatabase is secure (using the likes of SSL & IPsec).•User’s front end can be trusted.•All the data and metadata including user logs and scheme metadataare to be stored in an encrypted format.
  8. 8. SELECT table_structure,encryption_methodFROM paper• Ti,j - Cell at the intersection of ith row and jth column.• Jth attribute – Aj has domain Dj• T’ is the encrypted table where each T’i,j is an encryption of Ti,j• Each cell Ti,j of plaintext table is a bit string of exactly k1 bits -> ∀j ∈ [1,m] , Dj⊆ {0,1}^k1• The Encryption algorithm appends a random string of k2 bits to the plaintext. k0 = k1 + k2; => k0 is the i/p to the encryption algorithm using symmetric key encryption, o/p of encryption is also k0 bits long.
  9. 9. Privacy Preserving Queries• Consider that an intruder may observe upto t Queries - Q1,...,Qt, where t is a polynomially bounded function of k0.• quantify the information leaked by the protocol using a random variable α. the protocol only reveals α beyond the minimum information revelation if, after these queries are processed, what the database intruder has observed can be simulated by a probabilistic polynomial- time algorithm using only α, R(Q), and the encrypted table• |Pr[Ak0(Q1,...,Qt,q1,...,qt,T′) = 1]− Pr[Ak0(Q1,...,Qt,S(α,R(Q1),...,R(Qt),T′)) = 1]| < 1/p(k0).
  10. 10. SELECT * FROM paper WHEREkey_aspects = “BASIC SOLUTION”Queries of the formatSELECT * FROM T WHERE Aj = v; v ⊆ Dj•Encode each cell in a special redundant format for each cell Ti,j,the encrypted cell T′i,j= (T′i,j<1>,T′i,j<2>) has two parts.•The first part T′i,j<1>, is a simple encryption of Ti,j using a blockcipher E(); the second part, T′i,j<2>, is a “checksum” that togetherwith the first part enables the database to check whether this cellsatisfies the condition of the query or not.•T′i,j<1> and T′i,j<2> satisfy a secret equation determined by thevalue of Ti,j.•? What equation to use as a secret equation ?
  11. 11. ekFZ8JZMmdpMaAnLdNeyyM/pX/MKOvMa aka The Secret Equation Ef(Ti,j)(T′i,j<1>) = T′i,j<2>where f is a function. When the user has a query with condition Aj = v,the user only needs to send f(v) to the database so that the databasecan check, for each i, whether E f(v) (T′i,j<1>) = T′i,j<2>•v should not be easily derivable from f(v).•Hence, define f(v) as an Encryption of v using a Block Cipher – E(.)•Append a random string to Ti,j before applying E(.) to obtain T’i,j<1>.(Why ?)•To avoid having the same f(v) for different attributes, we append j tof(v) before applying E(.).
  12. 12. CREATE TABLE nursery_encrypted• The user first picks two secret keys s1,s2 from {0,1}^k0 independently and uniformly. The user keeps s1,s2 secret. For each cell Ti,j, the user picks ri,j from {0,1}^k2 uniformly at random and stores• T′ (i,j) △ = (T′ (i,j)<1>,T′ (i,j)<2>) = (Es1(Ti,j , ri,j), EEs2(Ti,j,j)(Es1(Ti,j,ri,j)))The above equation is just a Mathematical representation of all that we have seen so far.• Denote by Aj the jth attribute of T. Suppose there is a query select Aj1,..., Ajℓ from T where Aj0= v. To carry out this query, the user computes q = Es2(v,j0) and sends j0, q, and (j1,...,jℓ) to the database.• For i = 1,...,n, the database tests whether T′i,j0<2> = Eq(T′i,j0<1>) holds. For any i such that the above equation holds, the database returns T ′i,j1<1>,...,T′i,jℓ<1> to the user. The user decrypts each received cell using secret key s1 and discards the k2-bit tail of the cleartext.
  13. 13. nursery_encrypted has been created;COMMIT; operation took 25 seconds;but how secure is this table ?Claim:The only information revealed beyond the Minimal Information Revelation is theattributes tested in the “where” condition.Proof:Theorem 1: If the block cipher E is a pseudorandom permutation (with theencryption key as the random seed), the basic protocol reveals only j1,0,...,jt,0beyond the minimum information revelation.(Left as an exercise for the enthusiastic minds)Conclude:Even if the intruder has access to the whole database, the intruder can learn nothingabout the encrypted data. By combining j1,0,··· ,jt,o with the minimum informationrevelation, an intruder can derive some statistical information about the underlyingdata or the queries.
  14. 14. Performance EvaluationsThe Table•Table now has 9 attributes•12,960 records•100,000 data cellsThe Environmental setup•NetBSD Operating System•AMD Athlon 2GHz•512M MemoryThe Encryption Algorithm•Blowfish Symmetric Encryption Algorithm•K0 = 64 bit block size
  15. 15. INSERT INTO paperVALUES(‘Performance_Evaluation.gif’,empty_blob())
  16. 16. SELECT * FROM nursery WHERE …Query 1:SELECT ∗ FROM nursery_encrypted WHERE Parent=usual;(2 records returned)Query 2:SELECT ∗ FROM nursery_encrypted WHERE Class=recommend;Query 3:SELECT ∗ FROM nursery_encrypted WHERE Class=very recom;(4320 records returned)Query 4:SELECT ∗ FROM nursery_encrypted WHERE Class=priority;(3266 records returned)
  17. 17. SELECT * FROM paper WHERE key_aspects= “ENHANCED SECURITY SOLUTION”Drawback of the Basic Solution•Reveals which attributes are tested in the “where” clause.A straightforward solution to this is to randomly permute the attributesin the encrypted table, in order to make it difficult for a databaseintruder to determine which attributes are tested.Such a random permutation provides an Enhanced Security Solution.What permutation to choose ?•Uniform Distribution•Pseudorandom Permutation
  18. 18. more …• If the permutation is chosen uniformly from all permutations of the attributes, the user needs to “memorize” where each attribute is after the permutation. When the number of attributes is large, this is a heavy burden for the user.• To eliminate this problem, we use a pseudorandom permutation, which is by definition indistinguishable from a uniformly random permutation. [10]• We do not need to permute all the attributes in the encrypted table.• For each (i,j), we can keep T′i,j<1> as defined in the basic solution; we only need to permute the equations satisfied by T′i,j<1> and T′i,j<2> because only these equations are tested when there is a query.
  19. 19. SELECT * FROM paper WHERE key_aspects= “HIGH PERFORMANCE SOLUTION”• In the two solutions presented, performing a query on the encrypted table requires testing each row of the table which is very inefficient in large-sized databases.• We can significantly improve the efficiency if we are able to replace the sequential search in the basic solution with a binary search. However, our basic solution finds the appropriate rows by testing an equation, while a binary search cannot be used to find the items that satisfy an equation.• To sidestep this difficulty, we add some metadata to eliminate the need for testing “ekFZ8JZMmdpMaAnLdNeyyM/pX/MKOvMa” which was our secret equation.
  20. 20. Example ofMetadata• Specifically, for each cell in the column, add a tag and a link.• The tag is decided by the value of the cell; the link points to the cell.• Sort the metadata according to the order of the tags.• When there is a query on the attribute, the user sends the appropriate tag to the database so that the database can perform a binary search on the tags.
  21. 21. Query Time: solutionwith metadata v.sbasic solutionClearly, the trend is that the solution with metadatagains more in efficiency if there are fewer records inthe query result.However, even for a query with a large number ofrecords in the result, the solution with metadata ismuch faster than the basic solution.
  22. 22. COMMIT;• Investigated privacy-preserving queries on encrypted data.• Present privacy-preserving protocols for certain types of queries. (limited support)• Provided a rigorous, quantitative (and cryptographically strong) security.• Provided a high performance, secured solution.
  23. 23. SELECT references FROM paper[1] C. Blake and C. Merz. UCI repository, 1998.[2] Eric Dash. Lost credit data improperly kept, company admits. New York Times, June 20 2005.[3] G. I. Davida, D. L. Wells, and J. B. Kam. A database encryption system with subkeys. ACMTODS, 6(2):312–328, 1981.[4] L. Bouganim and P. Pucheral. Chip-secured data access: Confidential data on untrustedservers. In VLDB, 2002.[5] Hakan Hacigumus, Balakrishna R. Iyer, and Sharad Mehrotra. Efficient execution ofaggregation queries over encrypted relational databases. In DASFAA, 2004.[6] J. He and J. Wang. Cryptography and relational database management systems. In Int.Database Engineering and Application Symposium, 2001[7] J. Karlsson. Using encryption for secure data storage in mobile database systems. Friedrich-Schiller-Universitat Jena, 2002.[8] Gultekin Ozsoyoglu, David Singer, and Sun Chung. Anti-tamper databases: Query-ingencrypted databases. In Proc. of the 17th Annual IFIP WG 11.3 Working Conference on Databaseand Applications Security, 2003.
  24. 24. more …[9] Dawn Xiaodong Song, David Wagner, and Adrian Perrig. Practical techniques forsearches on encrypted data. In IEEE Symposium on Security and Privacy, 2000.[10] O. Goldreich. Foundations of Cryptography, volume 1. Cambridge UniversityPress,2001.
  25. 25. QUIT;