Private Range Query
by Perturbation and Matrix Based Encryption
     Junpei Kawamoto and Masatoshi Yoshikawa
              Kyoto University, Japan
Sep. 27, 2011           Private Range Query by Perturbation and Matrix Based Encryption      2




Cloud database and its security
• Recent research topics about security of cloud computing
   • Mainly focusing on service providers
   • How to analyze data without privacy problems (PPDM)
   • How to share data and manage encryption keys
   • How to execute queries over encrypted data


                                           web                                    Recently
                                                                                  focused
                User   Client                            Service Provider

• Less studies about compromise from queries
   • But, queries (i.e. what a user searched for) have important
     information about the user.
   • Security model about this problem was introduced only recently.
Sep. 27, 2011               Private Range Query by Perturbation and Matrix Based Encryption   3




Purpose and basic notions
• Private (range) query
   • We focus range queries, which include exact match queries as a
     special case.
   • obtains data without exposing any information about what the
     users requested to third persons including service providers.
• We do not perfectly believe in service providers
  • Actually, service providers are unlikely to become an attacker but…
  • Servers could be fallen by attackers or stolen physically
  • Users can’t know the actual life of their data stored in servers.
                         We should make a database service
                which doesn’t ask users to believe in service providers.
• We assume the scheme of databases is (Key, Value)
  • Users request queries over only the Key attribute
Sep. 27, 2011           Private Range Query by Perturbation and Matrix Based Encryption              4




Related work
                                                                      In our method, clients
• Encrypted databases                                                 transform queries, too.
   • To avoid leaks all data are encrypted by clients
   • Main topic is how to handle queries over encrypted data
  1-to-1 mapping (hash function, etc.)
      15:00                  4hwr2g                    15:00                              “4hwr2g”
                                                        ~                                    or
      15:12                  teg2b1                    15:12                              “teg2b1”

  many-to-1 mapping (k-anonymizer, etc.)
      14:45
                                                       15:00
      15:00                   15:00                     ~                                  15:00
                                                       15:12
      15:12


   They achieve some kind of private query but not enough!
Sep. 27, 2011             Private Range Query by Perturbation and Matrix Based Encryption        5




Frequency Analysis Attack (FAA)
• Attackers      who know the distribution of queries could
  guess plain queries from transformed ones.

                                       mapping

                               q                                                            q*
      Dist. of plain queries                              Dist. of transformed queries
  1-to-1 mapping (eg. hashing)                   Many-to-1 mapping (eg. avg)




                                      q*                                                    q*
       Dist. of transformed queries                        Dist. of transformed queries
Sep. 27, 2011             Private Range Query by Perturbation and Matrix Based Encryption           6




Key idea for protecting FAA
• Using 1-to-many mapping to make the dist. of transformed
  queries different from the original distributions
                                 Tk1(15:00)                                      Tq1(15:00-15:12)
                                                        15:00
      15:00                      Tk2(15:00)              ~                       Tq2(15:00-15:12)
                                                        15:12




                                  q                                                          q*
        Dist. of plain queries          mapping             Dist. of transformed queries


 To ensure this properties, we add perturbations to queries and then
 encrypt them.
Sep. 27, 2011          Private Range Query by Perturbation and Matrix Based Encryption   7




Inner Product Predicate (IPP) method
• Employs polynomials f(k) as queries to add perturbations
   • Query [a, b] is described as f(k) ≤ 0 with perturbation r.
                f(k)         NOT match                               f(k)

                               match
                                                            -r’        0
                                                                                     a   b k
     -r          0                              k
                       a              b                 Different r produces different query.
• Uses matrix based encryption
  • Matrix based encryption enables query processing w/o decryption
  • Query f(k) ≤ 0 are expressed by vector q, k as q・k ≤ 0
  • Encryption key is a regular matrix M
  • q and k are encrypted as Mtq and M-1k
  • The inner product is computed as Mtq・M-1k = qtMM-1k = q・k
                                                                  canceled
Sep. 27, 2011                Private Range Query by Perturbation and Matrix Based Encryption       8




Inner Product Predicate (IPP) method
• Perturbation-added polynomials f(k)                                              f(k)
   • fr(k) = (k – a)(k – b)(k + r) perturbation

• Vector form of attr. values and queries                                     -r      0        a   b   k
   • Key vector k = (k3, k2, k, 1)t
   • Query vector q = (1, r–a–b, ab–ar–br, abr)t                                Different r produces
   • The inner-product is q・k = (k – a)(k – b)(k + r)                              different query.
• Encrypting both vectors
 Key
matrix          Mt   q   ・   M-1       k    =      qt         M         M-1        k = q ・ k

    Encrypted query                                               Inner product can be computed
                         Encrypted attr. value
                                                                          w/o decryption
•   IPP method also adds perturbation to attr. values
    •   For details, please see our paper.
Sep. 27, 2011              Private Range Query by Perturbation and Matrix Based Encryption           9




  Scheme of IPP method
  • Adding tuples
                               Transformed tuple: (Tkr(k), v)
                               where Tkr(k) = M-1(k3, k2, k, 1)t
New tuple: (k, v)                                                                          Store (Tkr(k), v)
                                                               web

                    User      Client                                         Service Provider

  • Searching tuples
                                        Transformed query: Tq(a ≤ k ≤ b)
                                 where Tq(a ≤ k ≤ b) = Mt(–1, a+b–r, ar+br–ab, –abr)t

Query: a ≤ k ≤ b                                                                           Compute
                                                               web                      inner-products
                                                                                         for all tuples
                    User      Client                                         Service Provider
                           Server’s computational cost is O(n) (n: the number of tuples)
Sep. 27, 2011            Private Range Query by Perturbation and Matrix Based Encryption      10




Comparison of necessary memory size
                                        Plain                           Transformed
  Key attribute values                     lK                      12lK + 4(lφ + 3lm + lrk)
  Queries                                 2lK                        8lK + 4(ld + lm + lrq)

   • lk: bit length of key attribute values
   • lφ: bit length of perturbations for key attribute values
   • ld: bit length of perturbations for queries
   • lm: bit length of encryption keys
   • lrk, lrm: bit length of random values used to encryption

• Summary
   • Attribute values requires 12 times larger cost than plain case.
   • Queries requires four times larger cost than plain case.
Sep. 27, 2011           Private Range Query by Perturbation and Matrix Based Encryption   11




Experimental evaluations
• We have conducted to evaluate
  • The correlations between dist. of plain queries and transformed
    ones is low enough.
  • Query proc. time is O(n) with the number of tuples n.



• Common conditions
  • All programs are implemented in Python (2.6.4).
  • Experiments were performed on one 2.66GHz processor virtual
    machine with 512MB running on Virtual Box.
  • We chose parameters of IPP method as lK = lφ = lm = lrk = lrp = 32.
       • default size in many programming language
Sep. 27, 2011                  Private Range Query by Perturbation and Matrix Based Encryption    12




Exp. 1: Correlations of queries
• Query set
  • 1,000 queries which requested [a, a + 100] (a : 1, 2, ・ ・ ・ , 1000).

                                  A range query [500, 600] is mapped to 3.0×1013
      Transformed queries




                                                                             This graph shows only 1st
                                                                             elem. of query vectors


                                                                Query vectors were distributed in
                                                                wide range without depending the
                                                                plain values.


                            Left side of plain range queries
• Coefficient of correlations: 0.014679
Sep. 27, 2011       Private Range Query by Perturbation and Matrix Based Encryption   13




Exp. 2: Query processing time
• Conditions
  • Five databases which had different numbers of tuples
  • Requesting random one million queries to each database



                                         the query proc. time is according to O(n)
                                         with the number of tuples n



                          ×2

                                                     ×2
Sep. 27, 2011           Private Range Query by Perturbation and Matrix Based Encryption   14




Open problems
• Reducing computational cost of servers.
   • O(n) is min. cost because if servers could prune candidate tuples, it
     means servers, somehow, know what users request.
   • There is a trade off between security and computational cost.



• Attackers may guess the plain queries and attribute
  values by gathering and analyzing results of queries.
   • However, in general, each result of queries consists many tuples.
   • Gathering the results needs much more storage space.
   • We suppose that it is also necessary to argue about effectiveness
      of attacks for the results of querying.
Sep. 27, 2011        Private Range Query by Perturbation and Matrix Based Encryption   15




Conclusion
• We introduce a new private query.
  • Transformation algorithms are probabilistic.
  • Provide 1-to-many mapping for attribute values and queries.
  • The computational cost is O(n).
  • Low correlation between transformed distributions and plain ones.
  • IPP method is against the frequency analysis attack


• Future work
   • Reducing computational cost of servers.
   • Considering another attack for query results.




                                                 Thank you for your attention!

Private Range Query by Perturbation and Matrix Based Encryption

  • 1.
    Private Range Query byPerturbation and Matrix Based Encryption Junpei Kawamoto and Masatoshi Yoshikawa Kyoto University, Japan
  • 2.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 2 Cloud database and its security • Recent research topics about security of cloud computing • Mainly focusing on service providers • How to analyze data without privacy problems (PPDM) • How to share data and manage encryption keys • How to execute queries over encrypted data web Recently focused User Client Service Provider • Less studies about compromise from queries • But, queries (i.e. what a user searched for) have important information about the user. • Security model about this problem was introduced only recently.
  • 3.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 3 Purpose and basic notions • Private (range) query • We focus range queries, which include exact match queries as a special case. • obtains data without exposing any information about what the users requested to third persons including service providers. • We do not perfectly believe in service providers • Actually, service providers are unlikely to become an attacker but… • Servers could be fallen by attackers or stolen physically • Users can’t know the actual life of their data stored in servers. We should make a database service which doesn’t ask users to believe in service providers. • We assume the scheme of databases is (Key, Value) • Users request queries over only the Key attribute
  • 4.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 4 Related work In our method, clients • Encrypted databases transform queries, too. • To avoid leaks all data are encrypted by clients • Main topic is how to handle queries over encrypted data 1-to-1 mapping (hash function, etc.) 15:00 4hwr2g 15:00 “4hwr2g” ~ or 15:12 teg2b1 15:12 “teg2b1” many-to-1 mapping (k-anonymizer, etc.) 14:45 15:00 15:00 15:00 ~ 15:00 15:12 15:12 They achieve some kind of private query but not enough!
  • 5.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 5 Frequency Analysis Attack (FAA) • Attackers who know the distribution of queries could guess plain queries from transformed ones. mapping q q* Dist. of plain queries Dist. of transformed queries 1-to-1 mapping (eg. hashing) Many-to-1 mapping (eg. avg) q* q* Dist. of transformed queries Dist. of transformed queries
  • 6.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 6 Key idea for protecting FAA • Using 1-to-many mapping to make the dist. of transformed queries different from the original distributions Tk1(15:00) Tq1(15:00-15:12) 15:00 15:00 Tk2(15:00) ~ Tq2(15:00-15:12) 15:12 q q* Dist. of plain queries mapping Dist. of transformed queries To ensure this properties, we add perturbations to queries and then encrypt them.
  • 7.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 7 Inner Product Predicate (IPP) method • Employs polynomials f(k) as queries to add perturbations • Query [a, b] is described as f(k) ≤ 0 with perturbation r. f(k) NOT match f(k) match -r’ 0 a b k -r 0 k a b Different r produces different query. • Uses matrix based encryption • Matrix based encryption enables query processing w/o decryption • Query f(k) ≤ 0 are expressed by vector q, k as q・k ≤ 0 • Encryption key is a regular matrix M • q and k are encrypted as Mtq and M-1k • The inner product is computed as Mtq・M-1k = qtMM-1k = q・k canceled
  • 8.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 8 Inner Product Predicate (IPP) method • Perturbation-added polynomials f(k) f(k) • fr(k) = (k – a)(k – b)(k + r) perturbation • Vector form of attr. values and queries -r 0 a b k • Key vector k = (k3, k2, k, 1)t • Query vector q = (1, r–a–b, ab–ar–br, abr)t Different r produces • The inner-product is q・k = (k – a)(k – b)(k + r) different query. • Encrypting both vectors Key matrix Mt q ・ M-1 k = qt M M-1 k = q ・ k Encrypted query Inner product can be computed Encrypted attr. value w/o decryption • IPP method also adds perturbation to attr. values • For details, please see our paper.
  • 9.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 9 Scheme of IPP method • Adding tuples Transformed tuple: (Tkr(k), v) where Tkr(k) = M-1(k3, k2, k, 1)t New tuple: (k, v) Store (Tkr(k), v) web User Client Service Provider • Searching tuples Transformed query: Tq(a ≤ k ≤ b) where Tq(a ≤ k ≤ b) = Mt(–1, a+b–r, ar+br–ab, –abr)t Query: a ≤ k ≤ b Compute web inner-products for all tuples User Client Service Provider Server’s computational cost is O(n) (n: the number of tuples)
  • 10.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 10 Comparison of necessary memory size Plain Transformed Key attribute values lK 12lK + 4(lφ + 3lm + lrk) Queries 2lK 8lK + 4(ld + lm + lrq) • lk: bit length of key attribute values • lφ: bit length of perturbations for key attribute values • ld: bit length of perturbations for queries • lm: bit length of encryption keys • lrk, lrm: bit length of random values used to encryption • Summary • Attribute values requires 12 times larger cost than plain case. • Queries requires four times larger cost than plain case.
  • 11.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 11 Experimental evaluations • We have conducted to evaluate • The correlations between dist. of plain queries and transformed ones is low enough. • Query proc. time is O(n) with the number of tuples n. • Common conditions • All programs are implemented in Python (2.6.4). • Experiments were performed on one 2.66GHz processor virtual machine with 512MB running on Virtual Box. • We chose parameters of IPP method as lK = lφ = lm = lrk = lrp = 32. • default size in many programming language
  • 12.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 12 Exp. 1: Correlations of queries • Query set • 1,000 queries which requested [a, a + 100] (a : 1, 2, ・ ・ ・ , 1000). A range query [500, 600] is mapped to 3.0×1013 Transformed queries This graph shows only 1st elem. of query vectors Query vectors were distributed in wide range without depending the plain values. Left side of plain range queries • Coefficient of correlations: 0.014679
  • 13.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 13 Exp. 2: Query processing time • Conditions • Five databases which had different numbers of tuples • Requesting random one million queries to each database the query proc. time is according to O(n) with the number of tuples n ×2 ×2
  • 14.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 14 Open problems • Reducing computational cost of servers. • O(n) is min. cost because if servers could prune candidate tuples, it means servers, somehow, know what users request. • There is a trade off between security and computational cost. • Attackers may guess the plain queries and attribute values by gathering and analyzing results of queries. • However, in general, each result of queries consists many tuples. • Gathering the results needs much more storage space. • We suppose that it is also necessary to argue about effectiveness of attacks for the results of querying.
  • 15.
    Sep. 27, 2011 Private Range Query by Perturbation and Matrix Based Encryption 15 Conclusion • We introduce a new private query. • Transformation algorithms are probabilistic. • Provide 1-to-many mapping for attribute values and queries. • The computational cost is O(n). • Low correlation between transformed distributions and plain ones. • IPP method is against the frequency analysis attack • Future work • Reducing computational cost of servers. • Considering another attack for query results. Thank you for your attention!