SlideShare a Scribd company logo
1 of 27
Privacy Preserving Schemes for SQL Operations

                           Abhinav Parate, Bhupendra Singh

                          Indian Institute of Technology, Kanpur



      Abstract. With the global emergence of concerns for the privacy of huge amount
      of information being collected in the databases, the research work in the field of
      protecting databases from exposure or any other attack has increased. Encryption
      techniques- both conventional and modern have proved out to be good technol-
      ogy in protecting the sensitive data. However, once encrypted, data can no longer
      be queried for operations like Greater Than, Less Than, Substring matching and
      others. The only operation that could be done on encrypted data is to find ex-
      act match. Hence, performing query over encrypted data is associated with the
      overhead of decrypting the entire encrypted data and then performing the oper-
      ations. Here we present two schemes which can perform operations of substring
      matching directly over the encrypted string data. We also present schemes for
      performing aggregate operations like SUM and AVG over encrypted numerical
      data.The advantage of the scheme includes no overhead of decryting the entire
      data.The encryption scheme is robust and well protective.


1 INTRODUCTION
Present day Database Systems offer the protection of database from attack through the
means of access control which restricts the access to sensitive data. The access control
mechanism protects the privacy of sensitive data from intrusion through the database
system interfaces. The basic assumption is that the database is accessed through database
system interfaces. However, it is important to have such protection but this can prove
out to be insufficient. It is because the raw database files remain the part of operating
system and hence, the attack on computer systems may lead to privacy breach if some-
one gets the access to raw database files. The access to these files cannot be prevented
by the access control mechanisms.
    Encryption Techniques which have been proved out to be efficient in protecting
the sensitive information from being revealed easily.However, the present techniques of
encryption were not designed with the goal of protecting databases which can be very
huge in size. As a result, incorporating the present encryption schemes directly involves
a huge overhead of decrypting the entire encrypted data before performing any kind of
operation on them.
    We present here two encryption techniques for encrypting the data of type STRING.These
techniques have been designed while taking care of the requirements of performing the
operations related to strings directly over the encrypted data. The SQL queries involves
the following operations-
 – String Matching: The result of the operation includes all those strings which are
   equal to the given string as parameter
– LIKE operation %abc%: This operation results in all those strings which have
   ’abc’ as substring
 – LIKE operation a b : This operation results in strings of the form ’axb’ where x
   is any character
 – Pattern Matching:Mix of above two operations to result in strings matching some
   pattern
    Our first encryption scheme is ’VDES’ i.e. Varying Distribution Encrytion Scheme.
The idea behind this scheme is to completely change the distribution pattern of the
characters present in the data. The reason being that there are some characters which
occur much more than other characters. As a result, if someone knows the original
distribution of the character, he may guess the encryption scheme by looking over the
distribution of encrypted characters. This scheme has been designed specifically for the
domain where program being used to answer the queries, is assumed to be protected.
The encrypted database itself is useless if the program to answer query is not known or
available.
    The advantages of VDES over other encryption schemes are-
 – Operations It supports all the operations mentioned above and a very good encryp-
   tion
 – No false positives The results of the query over encrypted results only in true values
   and no false positives
 – It makes database useless if the VDES program is not available
 – It handles updates very easily
 – We can change the distribution of characters very easily with updates
    However,the application of the VDES is limited to incorporating it directly with
database which should know the unencrypted pattern i.e. it works in semi-encrypted
domain.It is not possible with this scheme to give the pattern itself in encrypted form.
To overcome this limitation, we have another scheme which works entirely in encrypted
domain. This scheme we call as PPES or Product of Primes Encryption Scheme. The
advantage of PPES over VDES is that it can work in distributed domain where key to
decrypt is with client and it may not be necessarily present with the databases having the
encrypted data. The client may send the pattern itself in encrypted form and databases
will perform the pattern matching operations and return the result having encrypted
data. This encrypted data can then be decrypted by the client using key. However, this
scheme can be used in place of VDES with databases.
    The above mentioned schemes are for STRING data type. However, there are also
numeric data types in databases on which various operations like finding SUM or AVG
are more common. For example, a firm may want to calculate the average salary of its
employees or an accounting firm may want to calculate the sum total of the data it has.
Due to privacy considerations, such operations should be performed over encrypted data
and it should be answered in encrypted form. For such scenario, we present a scheme for
performing aggregate operations like SUM, AVG or COUNT over encrypted numeri-
cal values. This scheme, like PPES, can be used to answer queries in encrypted and
distributed domain. This technique has got application in mobile computing and cryp-
tography where it is necessary to perform operations on secure encrypted data without
decryption. It is also important to keep the interaction to be minimal due to security
concerns.

1.1 Encryption of Databases
The encryption of database is possible in two ways as follows-
i.Encrypt all the files of database.
ii.Encrypt the data values and store them in tables.
As it must have been clear from the context, we will be encrypting data values and not
the files.

1.2 Report Layout
The rest of our report is organized as follows. We will first discuss the related work in
section 2 and describe in brief the various approaches till now. We will give the details of
VDES in section 3 followed by the details of step by step approach for PPES in sections
4,5 and 6. In section 7, we describe the scheme for finding all occurences of a pattern
specified as regular expression against a given database string.We will then discuss
our scheme for performing aggregate operations for numerical data types. Finally, we
give the formal aspects of privacy-preserving computation followed by conclusion and
directions for future work.


2 RELATED WORK
The problem we are discussing is a relatively new problem and hence, there is no work
related to performing pattern matching over encrypted data. However, the similar prob-
lem has been dealt in the field of numerical data as opposed to the string data.The ideas
applied in encrypting numerical data have been helpful in understanding the various
problems that can arise out of encryption in string data.
    The technique described in [1] includes strictly increasing polynomial functions to
encrypt integer values so that the order of input is preserved even in encrypted values.
Hence, the operations like Greater Than, Less Than,Equal to, can be very easily per-
formed on the encrypted data. Following that, the resulting encrypted values can be very
easily decrypted. This reduced the overhead of decrypting entire database.However, this
scheme had the drawback that it revealed the distribution of input data which could be
exploited using probabilistic techniques to estimate the values in some interval with
some confidence level.
    In [2], there are some schemes to support keyword searches over encrypted text in
emails.But these were not meant and suited for relational queries and databases.
    In [3], for the first time, there was talk about executing SQL over Encrypted data.
This model was for numerical data and it required many interaction between client and
server to get the results.Moreover, it resulted in the false positives whose post process-
ing involved considerable overhead.
    In [4], a scheme is proposed for performing operations related to order of input nu-
merical data. The scheme proposed had the advantage that the distribution of encrypted
data is totally independent of the distribution of input data. This scheme is robust against
computer systems attack and is well compatible with the databases. This scheme can be
extended for lexicographical ordering of strings but other pattern matching operations
cannot be supported. This scheme can be used to answer queries like MAX,MIN and
GROUP BY. But this cannot answer the queries like SUM or AVG. The model being
used by us for PPES is similar to the model described in this paper.
    In [13], techniques have been proposed for performing arithmetic operations like
addition and multiplication over encrypted data. These techniques are not applicable to
database as it cannot be used for addition of more than two numbers. It requires message
passing for each operation being performed.
    In[15],there is a data partitioning technique to buid privacy-preserving indices on
sensitive attributes of relational table.
    In other papers [5][6][7][8] and [9], some encryption transformations have been
discussed which allow direct computation on encrypted data.Such encryption transfor-
mations are called Privacy Homomorphisms. These papers presented the privacy homo-
morphisms for performing addition,multiplication and multiplicative inverse computa-
tion on encrypted data.However, these schemes handled very small subset of input data
and had no application in databases. In [14], there is a technique for secure computation
in mobile cryptography.


3 VDES:VARYING DISTRIBUTION ENCRYPTION SCHEME

As the name suggests, the idea of this scheme is to effect the distribution of characters
so that attack based on cipher-text analysis is not possible.To understand the complete
idea, let us look at simpler idea and the following distribution of characters in database:

character Frequency
a         200
b         100
e         300
Let us now look at the following encoding scheme.

character Encoding
a         1234,3578
b         2598
e         1079,2234,7634
With the above encoding, we can encode the character a as either 1234 or as 3578 with
equal probability. As we saw in the frequency table above, frequency of a is 200.Hence,
a will be encrypted as 1234 100 times and as 3578 for another 100 times. Similarily, b
will be present as 2598 for 100 times. And e will be encrypted as 1079,2234 and 7634
each 100 times.
    String abe may get encrypted as 357825981079 without giving any idea of what
characters are present in string. Encryption of String abe for the second time may give
completely different result.
However, this simple idea has the drawback of assuming static distribution of char-
acters which is not true in many cases.And hence, we need to add further encrypting
values in character-encoding table. Also, the decrypting key needs to be updated for
variable distribution case.

3.1 Improving the Scheme
character Characteristic Prime
          Prime          Multipliers
a         ap             a1 , a2 , a3 , ...
b         bp             b1 , b2 , b3 , ...
e         ep             e1 , e2 , e3 , e4 , ...

In this scheme, we have a characteristic prime associated with each character.For char-
acter a, we will have ap as the characteristic prime. We also have set of prime multipliers
associated with a.When a is to be encrypted, some character is chosen randomly from
the given multiplier set and is multiplied with ap . Hence, a may get encrypted as ap a1
or ap a2 or ap a3 and so on.The decryption key for this scheme will be the set of char-
acteristic primes Cp . Using Cp , we can check if some encrypted character is equal to
a or not by checking its divisibility with ap .Moreover, we can effect the distribution at
any instant by changing the set of Prime Multipliers Pm . Any further, updates will be
handled by the new set.
    The advantage we have is that we can change the distribution of the characters as
we may want.Another advantage that we have is that we are not required to change the
decryption key although the Prime Multipliers set Pm may have got changed. Hence,
we can change the set Pm , expand it or have completely new elements in it but it does
not effect the already encrypted values in any case and are not required to be updated.

3.2 Performing the operations
String Matching The string matching algorithm over encrypted string is similar to
normal string matching. In this algorithm,the characters of given string are matched
with the characters at respective positions of the second string.In our scheme, if we
are checking the equality of character a with encrypted character d, we will get the
characteristic prime ap for character a and check that whether encrypted character d is
divisible by ap or not.

Pattern Matching and Substring matching This operation which checks for the ex-
istence of a given pattern in the encrypted string is as easy as matching a string with
encrypted string. The pattern string may have % symbol which represents presence of
one or more random characters at the postion of %. The pattern string may also have
symbol to indicate presence of exactly one character at the position of in the pattern
string. If the pattern does not have % or then the problem reduces to finding a string
with given pattern as its substring. There are efficient algorithms to deal with problem
of substring existence. Those algorithms can be easily applied in our case.
Search for patterns like ab%cd in string s can be handled easily by searching for ab
followed by searching for presence of cd in the substring in s following ab.
    Similarily search for patterns like ab cd in string s can be handled by first searching
for presence of ab in s and then, skipping the character following b in s.If the character
at this position in s matches c and the character following matches d, string s will be
given out as the string having pattern ab cd.
     Normal String Matching Algorithm

characterAt(int pos,String s)(
    return the character present at position pos in the String s
)
Stringmatch(String s, String t)(
    int position = 1
for(position=1, ,position++)(

     if(characterAt(position,s)==null and characterAt(position,t)!=null)
     return ”Unequal strings”
     else if(characterAt(position,s)!=null and characterAt(position,t)==null)
     return ”Unequal strings”
     else if(characterAt(position,s)==null and characterAt(position,t)==null)
     return ”Equal Strings”
     else if(characterAt(position,s)!= characterAt(position,t))
     return ”Unequal Strings”
     else continue
))


3.3 Correctness of VDES

VDES requires the following sets for its operations:
Σ: Set containing all the characters in a language Cp : A set containing of characteristic
primes of each character belonging to Σ
Pm : Set containing a set of Prime numbers.This set must be disjoint to set C p i.e.Cp ∩
Pm = φ
Pi : Multiplier Set for the character ci ∈ Σ. Pi ⊆ Pm

We have some functions as described below:
f :Σ → Cp
The function f is one to one and it maps each character ci ∈ Σ to pi ∈ Cp .Hence, if
f(ci ) =f(cj )
⇒ ci = cj

g: A function that takes a set of elements as input and return one element taken off
randomly from the input set.
h:Σ → Pm
The function h takes a character ci as input and it returns some element m from its mul-
tiplier set Pi .
Hence,h(ci ) = g(Pi )

   Using above functions, we can encrypt any character ci ∈ Σ as follows-
E(ci ) = f (ci ) × h(ci )



Theorem 1 Encryption of character x ∈ Σ as E(x) using VDES results in a unique
mapping and decryption of E(x) does not result in any false positive i.e. the result of
decryption is correct.

Proof. Let us suppose we encrypt character ci ∈ Σ as E(ci ) then
E(ci ) = f (ci ) × h(ci )

     Assuming that the VDES results in false positive i.e.some character c j identifies for
ci .Hence, Based on the VDES, we must have a characteristic prime p j = f (cj ) that
divides E(ci ). Since, E(ci ) is product of only two prime numbers hence, if f (cj ) divides
E(ci ) then either f (cj ) = f (ci ) or f (cj ) = h(ci ).
     If f (cj ) = f (ci ), then ci = cj as f is one to one mapping.Hence, it leads to contra-
diction to assumption that we had a false positive.
     If f (cj ) = h(ci ), we know that f (cj ) ∈ Cp and h(ci ) ∈ Pm .Also, Cp and Pm are
disjoint.Hence, f (cj ) = h(ci ).
     Hence, we cannot find any cj ∈ Σ which can act as a false positive for ci .Thus,
decryption of any encrypted character will result in correct character and not false char-
acters.

As we know that, algorithms to find a given substring t in a string sproceed by match-
ing character by character.And as Theorem 1 says, if some character is matched then
it must be the correct one.So, if each character found is correct and we have matched
the pattern,then pattern matching must also be correct as we did it character by charac-
ter.Hence,VDES query execution results only in correct results and no false positive.It
can be stated in the form of lemma as follows-

Lemma 1. Pattern matching using VDES does not result in any false positives.


3.4 Analysis of Cryptographic Attack on VDES

As the purpose of encryption scheme is to protect the sensitive data from being exposed,
we must analyse the kinds of attack that are possible and how succesfully they can be
handled.
    Let us suppose | Pm | = m and | Σ | = n.Hence, we have n characters in a lan-
guage.Also, the number of possible multipliers for each character is m. When we en-
crypt a database using VDES, we can use each multiplier along with each characteristic
prime to encrypt and result in some random distribution.As a result of using m mul-
tipliers for n characters, the encrypted language consists of mn numbers.So, we have
encrypted each character with m possible numbers.
    If someone has the idea of n, then he can calculate m from mn.Now, the attacker
knows that each character is mapped to m different numbers.So, if he tries to apply
Brute Force analysis method to identify the character and its encryption set then he can
do so in following way-
Of the mn numbers known to him, he can select m numbers randomly and map it for
some character c1 .From the remaining (n-1)m numbers, he can select further m numbers
to map it for character c2 .He can continue to do so c3 , c4 ...cn .After getting sets for each
character, he can try to check if this sets work correctly.
    In this brute force way, he would be required to test around
nm
   Cm ×(n−1)m Cm ×(n−2)m Cm ×(n−3)m Cm ... ×2m Cm ×m Cm
= (nm)!/(m!)n
= (nm)(nm − 1)(nm − 2)....1/(1.2.3....m)n
> (n − 1)m (n − 2)m ....1m
= ((n − 1)!)m


    Hence, we see that if we have m>1 then, we can improve the complexity for brute
force analysis significantly.And we must have m≥n to take care of the distribution sce-
nario.
    Cipher-Text Analysis
Another kind of attack is based on the distribution analysis of cipher-text. By distribu-
tion analysis, we mean that in general texts, frequency of some characters may occur
much more than other characters.Using this information we can look for the character
with maximum frequency and guess the character. As in English Language, character e
is used most and hence, it can be easily recognised.After recognizing e, other charac-
ters can be guessed using idea of distribution of other characters.For example, if z can
occur 0.1 times e occurs, then we can look for the encrypted character whose frequency
is roughly 0.1 times the most frequent character.Knowledge of domain can help very
much in such cases.
    As a result, it is very important to protect the distribution of characters.Hence, we
came up with the idea of varying distribution. VDES results in a distribution which
gives no idea of the original distribution of characters.Hence, it is secure from Cipher-
text analysis attack.
    However, there is another kind of cipher-text analysis where we look for the most
frequent digram (two characters in a sequence) in cipher-text.This is also not possible
in VDES as the same digram can be mapped in m*m ways.Hence, it is difficult to tell
which digram is most frequent.
    Another way to attack is to identify the set Cp .This set can be obtained only by fac-
torising the product terms available to the attacker.This attack can be made very costly
if we use very big prime numbers in the set.Hence, factorization becomes difficult.
3.5 Memory Requirement

VDES has been made very efficient to protect the sensitive data by using very large
prime numbers in Cp andPm . However, this comes at a cost in terms of memory re-
quirements.
    Any prime number p requires lg(p) + 1 bits of memory. Since, we need to save
the product terms, memory for a product term is 2lg(p) + 1 where p is the largest
prime number in Cp ∪ Pm .
    Hence, a character which required 8-bits now requires around ( 2lg(p) + 1)-bits.


3.6 Strength of VDES

The strength of VDES lies in the fact that it performs the task of pattern matching
over encrypted data in an elegant manner and at the same time, the encryption scheme
is strong enough against brute force analysis attack or cipher-text analysis attack. Al-
though the encryption key may change, the length and the value of decryption key
remain fixed.
     This scheme can be readily integrated with the Database softwares available in the
market and can be used for the goals we inteded to achieve.
     i.To reduce the overhead of decrypting entire database in order to perform a query.
ii.To design an encryption scheme which achieves the goal stated above along with the
protection of database from cipher-text analysis attack.


4 PPES: PRODUCT OF PRIMES ENCRYPTION SCHEME

4.1 Why introducing PPES when VDES works?

As we have seen in the previous section that with VDES, we have been able to achieve
the goals we intended to achieve.The use scenario of next scheme called Product Of
Primes Encryption Scheme is little bit different from the scenario of VDES. In VDES,
database program is required to have knowledge of decryption keys for answering the
queries.On the other hand, PPES has no knowledge of encrypting or decrypting keys
and it can still execute the queries and answer directly in encrypted form.PPES returns
the correct results or atleast a superset of correct results. The results will be in encrypted
form as the database is not aware of the key.Also, the query will have the pattern string
in encrypted form. Hence, PPES works entirely in an encrypted domain. This has an
advantage when we deal with databases and the results distributed over network.
    VDES can be used efficiently where protecting pattern itself is not important and
the program answering query is assumed to be protected. Based on the knowledge of
keys,VDES database performs the query execution and returns unencrypted answers
if desired.PPES can also be used in this use scenario. For this, we can make database
aware of the keys.And when the query arrives, we can modify this query to have en-
crypted pattern and then perform execution of this modified query. With the help of
decryption key, we can decrypt the result to give final result as output.
    As it is clear from the two use scenarios, there was a need of PPES.
4.2 Model of PPES

Model of PPES is shown in figure below.




4.3 Intuition behind PPES

Let us consider this encryption scheme-
Assuming that there is a function e:Σ ∗ → P where Σ is the set of alphabets or characters
and P is a set of prime numbers.Hence e is function that maps each substring to a unique
prime number.
    Using the above mapping, we can encrypt any string s as E(s) as follows:
E(s) = e(t1 ) × e(t2 ) × e(t3 ).... × e(tn )
where t1 , t2 , t3 , ...., tn are the possible substrings present in s
    Similarily, E(abc) = e(a) × e(b) × e(c) × e(ab) × e(bc) × e(abc) As a result of above
way of encryption, E(s) has a term for every substring present in it. So, if we look for a
substring t in s, we can evaluate E(t) first and then check for the divisibility of E(s) by
E(t). Divisibility implies the presence of substring t in s.This can be stated as

   A string s will have string t as its substring if E(t)|E(s)
Inverse of above:If some number N divides E(s), then there exists a substring t in
s such that E(t)=N.This statement may not be true. As we saw in case of E(abc), it
is divisible by e(ab) × e(bc) but there exists no substring t such that E(t)=e(ab) ×
e(bc).Hence, inverse statement is false in this case.
    Although this idea can check the presence of a substring in just one step but the
number of primes required to encrypt any string increases exponentially with the length
of string. If the number of character are n and the maximum length of any string is l,
then the number of primes required are around nl+1 . If n is around 256 which is 28 then
the number of primes required can go out of bound with l = 5 only.One way of saving
the number of primes required is to allot the prime number to only those substrings
which are actually present in dictionary.And all remaining substrings possible may be
given a single unique prime number. This idea is again not good as the size of dictionary
may also be very large.

4.4 Our approach to PPES
As we saw above that the requirement of primes is increasing exponentially with the
size of the strings handled by PPES scheme. So we can restructure the scheme in the
following manner so that it would be feasible to implement PPES:
    Let Σ is the set of alphabets and P is the set of available prime numbers. Now
consider the random (or private) mapping e : Σ ∗ → P such that e maps strings over Σ
to pth prime number from the set P where 1 ≤p ≤| P |.
    Since mapping is random (or private) then we can assume that the probability that
two strings will map to same prime number is 1/| P | (or can be calculated based on the
private mapping).
    Now we can encrypt a string s as E(s) using above mapping as follows:

E(s)= e(t1 ) × e(t2 ) × e(t3 ).... × e(tn )

where t1 , t2 , t3 , ...., tn are all the possible substrings present in s.
The same is applied to the pattern t to get E(t). Hence if t is a substring of s then E(t)
must divide E(s).

4.5 Performing the operations
String matching String matching in PPES is totally different from traditional string
matching techniques. So if we are searching for string abc (let say string t) then first we
have to encrypt t using the above encryption function i.e.

E(t) = e(t1 ) × e(t2 ) × e(t3 ).... × e(tn )

where t1 , t2 , t3 , ...., tn are the all possible substrings present in t

Now if E(t) divides E(s) where s is the string in which t is being searched then t is
present in s or we can say that t is one of the substring of s.
In PPES, string matching cost depends only on length of string being matched unlike
traditional techniques where cost depends on length of string being matched as well as
length of target string. So in PPES, string matching cost is just the cost of encrypting t
and cost of one division operation. So if the pattern string is small enough then string
matching can be done very fast.


4.6 Correctness of PPES: Existence of false positives

Lets consider a given string s whose length is W so s will have W(W+1)/2 substrings.
And we are searching for string t of length w. Let E(s) and E(t) be the encryption of s
and t respectively.


Theorem 2 If t is a substring of s then E(t) must divide E(s).

Proof. Lets t has k substrings t1 , t2 , ...., tk so E(t) will be

E(t)= e(t1 ) × e(t2 ) × e(t3 ).... × e(tk )

   And as t is a substring of s so all the substrings of t must be present in the set of all
substrings of s. So if s has n+k number of substrings then the encryption of s will be

E(s)= e(t1 ) × e(t2 ) × e(t3 ).... × e(tk ) × e(s1 ) × e(s2 ).... × e(sn )

    And as we can see that all the factors of E(t) are present is E(s) so E(t) must divide
E(s). But vice versa is not true.


Lemma 2. String matching in PPES will result into a superset of actual resultset i.e.
false postivies may exist in the resulting set but there will be no false negative.

Proof. In above theorem we have shown that false negative will not exist. Now suppose
if t is not a substring of s then we can only assure of t not being present in s which
implies that all other substrings of t except t itself, may be present in s which implies if
e(t) and e(some substring of s) are mapped to same prime then it can make E(s) divisible
by E(t) so it will lead to false positive.


4.7 Probabilty analysis of false positives

In this section we shall try to find out an upper bound on probabilty of a false posi-
tive. So lets assume t is pattern string of length w and s is the given string of length
W (W ≥ w) and E(t) divides E(s). So there are w(w + 1)/2 substrings in t, and each
of them is mapped to the same prime as some substring of s. In particular, the string t
is mapped to e(t) and e(t) is present in E(s). So we have to find out an upper bound on
the probability that E(t) divides E(s) but t is not a substring of s.
Since E(t) divides E(s), the string t maps to e(t) which is also the mapping of some
substring of s. Now there are W(W+1)/2 substrings of s so the probability of this is
bounded by

   (W (W + 1)/2) × (1/ | P |).
So probability will decrease with larger | P |.

4.8 Memory requirements
Let s be a string whose unique substrings are t1 , t2 , ...., tk and substring ti repeats ei
times. And let ti is mapped to prime pi i.e. e(ti )=pi . Then the encryption corresponding
to s will be

E(s) = e(t1 )e1 × e(t2 )e2 × .... × e(tk )ek

And the size of the number p is given by lg(p) bits. So the string s will consume
lg(E(s)) bits.

lg(E(s)) = (e1 × lg(e(t1 ))) × (e2 × lg(e(t2 ))) × ....(ek × lg(e(tk )))

If pmax is the largest prime from the set of primes, P. Then

lg(E(s)) ≤ lg(pmax ) × (e1 + e2 + .... + ek )

And (e1 + e2 + .... + ek ) is same as W(W+1)/2 where W is the length of s. So,

log(E(s) ≤ lg(pmax ) × W (W + 1)/2

Which implies that memory requirements will increase quadratically with the size of
string. So this scheme is not good for large documents but can handle strings of middle
size i.e. addresses.

4.9 Cryptographic analysis of PPES
For querying PPES encryted database, pattern t must be given in encrypted form, E(t),
or indivdual factors can also be given over network but E(t) is preferred as individual
factors may reveal the nature of query (if somebody knows about the domain i.e. if
roll numbers are being stored into the database and somebody knows the format of roll
number i.e. Y**** then he can easily break the query). But in database we are storing
encrypted form of string s so factorizing E(s) will take significant amount of time. Lets
assume that | Σ |=n and | P |=m. So if somebody knows n and m then tries to break
this encryption then he can do as following:
 1. Factorize reasonable amount of numbers from the database which is obviously very
    hard task as string of length l will have l(l + 1)/2 prime factors.
 2. Then analyze the distribution of all prime numbers or choose brute force analysis
    i.e. map each prime to all possible strings.
Now if n=40 and l=20 where l is domain parameter which tells that all strings in
database can be of atmost length l then then all possible strings in that domain(≥
      l
(n × n −1 )) will be around 4020 (> 1025 ). So brute force approach will have that many
     n−1
mappings for a single prime.


    Cipher text analysis of the data is very hard as all the data is in form of product of
many prime numbers so factorizing each one is very expensive hence attacked based on
distribution can’t happen.

4.10 Strength of PPES
The main strength of PPES is that we can query the database over the network with
high safety. Pattern matching operation (which is a very common operation on strings
in database) in PPES is very fast and is independent of size the target string.


5 Extension of substring method
Let s be a string. Let n = |s|. Suppose that we store a set {s1 , s2 , . . . , sm } of distinct
substrings of s. Encode each of these substrings, that is, obtain, E(s 1 ), E(s2 ), . . . , E(sm )
and then multiply these m primes to obtain the encoding of the string. The number of
distinct substrings of s is n · (n + 1)/2. Therefore, this scheme saves space compared
to the PPES scheme, provided, m < n · (n + 1)/2.
      The method for checking whether a given string t is a substring of s is as follows.
t is a substring of s if and only if t can be extended (perhaps on both sides) so that the
extended string becomes equal to s. If t cannot be extended as above , then t is not a
substring of s. Equivalently, if t can be extended in either direction so that it becomes
equal to any one of the known substrings s1 , s2 , .., sm , then also t can be inferred to be
a substring of s. The problem is therefore the following.
      Design a choice of the substrings s1 , .., sm of s in such a way so that for every string
t, it can be efficiently inferred whether t is a substring of s. For a given value w, a string
t is said to be a w-extension of t, if, t is a substring of t and |t | − |t| = w. The set
of all strings that are w -extensions of a given string t, where, w <= w, defines the
w-extension ball centered at t.
      Suppose t lies in the w-extension ball centered at t, and an encoding E(t ) is avail-
able. Then, if t is extended by a total of at most w characters on either side and then
encoded, this encoding will equal the encoding of t . The complexity of this operation
is the number of extension strings that are checked, that is, the size of a w-extension
ball. Let Σ denote the alphabet over which the strings are formed. Then, the size of any
w-extension ball is given by

                                |Σ|x+y (x + y + 1) = O(w · |Σ|w+2 )
                     0≤x+y≤w

The problem is to design substrings s1 , . . . , sm such that given any substring t of s,
there exists an index r, 1 ≤ r ≤ m, and sr lies within a w-extension ball centered at
t. Consider the following scheme. Keep substrings si,j , where, 1 ≤ i ≤ w+1 , and n

1 ≤ j ≤ 1 + w+1 . The substring si,j refers to the substring of s of length (w + 1) · i
                 n

that begins at position (w + 1) · (j − 1) + 1. For a fixed value of i, the set of substrings
si,j gives substrings of a fixed length at offsets 1, 1 + (w + 1), 1 + 2 · (w + 1), . . ., etc..
The set of substring lengths ranges from w + 1, 2 · (w + 1), 3 · (w + 1), . . ., etc..

5.1 Correctness of the scheme
In this section, we argue the correctness of the above scheme.
    We say that a substring t of s is w-covered by a substring si,j , provided, si,j is in the
w-extension ball centered at t. We first show that the set of strings that are w-covered
by the si,j ’s are all distinct. That is, if (i, j) = (i , j ), then, the set of strings covered
by si,j and the set of strings covered by si ,j has no overlap.
Lemma 3. Let t be a substring of s that is w-covered by both si,j and si ,j . Then,
i = i and j = j .

Proof. Suppose that i = i , that is, the lengths are different. Thus, ||si,j | − |si ,j || > w.
However, if t lies in the coverage of both the substrings, then, it follows that,

        w > ||si,j | − |si ,j || = |(|si,j | − |t|) − (|si,j | − |t|) = |w1 − w2 | ≤ w

where, w1 = |si,j | − |t| and w2 = |si,j | − |t||. Since, both si,j and si,j lies in the
w-extension ball centered at t, it follows that 0 ≤ w1 ≤ w and 0 ≤ w2 ≤ w, implying
a contradiction. Now suppose that i = i , that is the lengths of the substrings are the
same, and j = j . A similar argument can be made for this case as well.
The above argument shows that if t is any substring of s, then there is at most one s i,j
such that si,j is in the w-extension circle centered at t. We now show that for every
substring t of s, there is always one si,j lying within the 2w-extension ball centered at
t.

Lemma 4. Let t be a substring of s. Then there exists at least one si,j that 2 · w-covers
t.

Proof. Suppose that t starts at position p. Let r be the remainder and j be the quotient
when p − 1 is divided by w + 1, that is,

                    p − 1 = (w + 1) · j + r, where, 0 ≤ r < w + 1.                          (1)

Let i and r be the quotient and remainder respectively, when, |t| + r is divided by
w + 1, that is,

                   |t| + r = (w + 1) · i + r , where, 0 ≤ r < w + 1.                        (2)

Consider the substring si,j , where, j = j + 1 and i = i if r = 0 and i = i + 1
otherwise.
   Since p − 1 = (w + 1) · j + r, it follows that the starting position of the string si,j ,
namely, (w + 1) · (j − 1) + 1 = (w + 1) · j + 1 ≤ (w + 1) · j + r + 1 = p. Therefore,
the starting position of si,j lies on or before the starting position of the given substring
t. The length of the string si,j is given by (w + 1) · i. The effective length of the string
becomes |t| + r. If r = 0, then, the ending location of si,j matches the ending location
of t. Otherwise, r > 0 and i = i + 1. Therefore, ending location of si,j equals
   (w + 1) · (j − 1) + (w + 1) · i
    = (w + 1) · j + (w + 1) · (i + 1) = (w + 1) · j + (w + 1) · i + (w + 1)
    = (w + 1) · j + |t| + r − r + (w + 1) = (w + 1) · j + r + |t| − r + (w + 1)
    = p − 1 + r + |t| − r + w + 1 = (p + |t| − 1) + r + (w + 1 − r )
    > (p + |t| − 1)
Thus, si,j ends on or after t ends. It follows that si,j contains t. By the above analysis,
it follows that,
                             |si,j | − |t| = (w + 1) · i − |t| .
There are two cases, namely, i = i , if r = 0 and i = i + 1, if r > 0. Suppose that
r = 0. Then,
            |si,j | − |t| = (w + 1) · i − |t| = |t| + r − |t|, by equation (2)
                          = r ≤ w, by equation (1).
Otherwise, assume that 0 < r ≤ w and i = i + 1. Then, we have,
        |si,j | − |t| = (w + 1) · i − |t| = (w + 1) · (i + 1) − |t|
                      = (w + 1) · i + (w + 1) − |t|
                    = |t| + r − r + (w + 1) − |t|, by equation (2)
                    =w+1+r−r
                    ≤ 2 · w, since, r ≥ 1 and r ≤ w.
This worst case is attained if t is the substring s[w + 1, w + 2] of size 2. Then, it is
covered by the substring s2,1 of size 2 · (w + 1). Therefore, the minimum extension
necessary is 2 · (w + 1) − 2 = 2 · w, matching the bound in Lemma 4. The test for
whether a given string t is a substring of the string s is as follows. Enumerate the w-
extension ball centered at t and check if any of them is exactly equal to one of the s i,j ’s
(or, equivalently, encode each string of the w-extension ball centered at t and check if
the encoding divides the product i,j E(si,j )). In view of Lemma 4, this test works
if instead of w, we use the parameter w in the definition of the si,j ’s. Therefore, the
                                         2
number of strings si,j is given by
                                       n     2         n2
                                   w             =O
                                   2    +1             w2

5.2 Lower bound on number of substrings required
We now consider a lower bound on the space used by any encoding algorithm that uses
the above principle of matching strings from a w-extension ball centered at t.
Lemma 5. Let s be any substring of s of size at least w + 1. Then, the number of
substrings t such that s lies in the w-extension ball centered at t is at most 1 · (w + 1) ·
                                                                               2
(w + 2).

Proof. Let a denote the number of characters extended at the left end of t and b denote
the number of characters extended at the right end of t, to obtain s . Each distinct choice
of a and b such that a + b ≤ w gives a distinct t such that s lies in the w-extension ball
centered at t. Therefore, the number of possible such substrings is given by
            w w−a          w                     w+1
                                                              1
                     1=         (w − a + 1) =           a =     · (w + 1) · (w + 2) .
                                                              2
           a=0 b=0        a=0                    a =1

Lemma 6. The number of substrings of a given string s of size n that must be stored
by any scheme that works based on testing the equality of one of the substrings with a
                                                            n·(n+1)
member of the w-extension of a given string t is at least (w+1)·(w+2) .

Proof. The number of substrings of a given string s is n · (n + 1)/2. By Lemma 5, it
follows that each substring of size at least w + 1 w-covers (w + 1) · (w + 2) substrings.
Substrings of size less than w + 1 w-covers even fewer strings. Therefore, in order to
w-cover all substrings, the minimum number of strings that must be used is at least
   n·(n+1)
(w+1)·(w+2) .

                                                                                        2
It follows that the number of substrings used by our proposed scheme is O( w2 ), which
                                                                              n
                                                                n·(n+1)
is within a small constant factor of the lower bound, namely, (w+1)·(w+2) , by Lemma 6.


6 A Hierarchical Scheme

In this section, we propose a two-step scheme that is based on the previous scheme.
    Let w1 and w2 be two integer parameters, where, w1 > w2 > 0. Given a database
string s, we first divide it into adjacent blocks of size w1 , that is, s = s1 s2 · · · sk , where,
each of the si ’s have size w1 and the last block is padded by the requisite number of
null characters to ensure that its length is exactly w1 . The encoding of s, namely, E(s),
is a hierarchical data structure. Its first field is the sequence of the encodings of the
blocks, that is, E(s1 ) ◦ E(s2 ) ◦ · · · ◦ E(sk ), where, ◦ denotes the sequencing operator.
We assume that w1 is large enough to negate statistical attacks to decode the encrypted
blocks. Let t be a substring of s. Then, the following two exclusive possibilities hold.

 1. In a match of t with s, t spans more than one block of s. In other words, t can be
    written as t = t0 t1 t2 . . . tl , where, l ≥ 1, and t1 , t2 , . . . , tl−1 are blocks of length
    w1 each and |t0 | ≤ w1 and |tl | ≤ w1 . Further, there exists an index r with the
    property that sr = t1 , sr+1 = t2 , . . . , sr+l−1 = tl−1 and t0 is a suffix of sr−1 and
    tl is a prefix of the string sr .
 2. In a match of t with s, t is contained within a block of s. That is, |t| ≤ w 1 and t is
    a substring of sr , for some index r, 1 ≤ r ≤ k.
The second possibility can be effectively solved by storing, for each block s i , 1 ≤ i ≤
k, a set of substrings of si in their encoded form using an extension parameter w2 , as
detailed in Section 5. Thus, if |t| ≤ w1 , and there is a match of t with a substring of
s that is completely contained within a block, then, such a match can be effectively
decided. The total number of encodings in this scheme are as follows.
                               2                2
                              w1            n w1                n · w1
                     O k·      2    =O        · 2       =O          2                      (3)
                              w2            w1 w2                w2

In order to identify a match where t is a substring of a block of s, the w 2 -extension
ball of t has to enumerated and compared against all the encodings of the substrings
of the blocks of s. Assuming that all the encodings are stored as a hash table, then,
the dominant component of the cost of this operation is the cost of enumerating the
w2 -extension ball centered at t. As discussed in Section 5, this cost is O(w 2 · |Σ|w2 ).

Single Text Offset, Multiple Pattern Offsets. We now consider the scenario depicted by
the first possibility, namely, that t is a substring of s and a match of t spans multiple
blocks of s. Since the match of t with the matching portion of s may start at any po-
sition p, 1 ≤ p ≤ w1 , where, p is the starting position of first block of s from where
the match begins. To alleviate this situation, we assume that the pattern t is encoded in
w1 distinct ways, obtained by shifting the starting position of the first block of t by a
parameter u = 0, −1, −2, . . . , −(w1 − 1) respectively. Thus, if u = 0, the first block
contains the characters t1 t2 · · · tw1 and the remaining blocks are constructed sequen-
tially thereafter. If u = −1, then the first block contains the characters t 1 t2 · · · tw1 −1 ,
and the remaining blocks are constructed sequentially thereafter. For a general value of
u, the first block contains the characters t1 t2 · · · tw1 +u , and the remaining blocks are
encoded in sequence. Thus, t is encoded w1 times, with offsets ranging from 0 to w1 −1.
This increases the encoded size of the pattern t by a factor of w1 . Note that although
we have assumed that the offset for the text string s is 0, it is in general not necessary
to assume this. The offset for the text string can be set to any random value between 0
and −w1 + 1; this has the added advantage of reducing the chance of statistical attacks.

Matching Algorithm. For a given offset position u, −w1 + 1 ≤ u ≤ 0, denote the j th
                                               (u)
block of t defined for this offset position as tj . Let lu denote the number of blocks of
t encoded with an offset of u. Assume that t is a substring of s and there is a match of t
spanning multiple blocks of s. Then, there exists an offset position u, −w 1 +1 ≤ u ≤ 0,
                                          (u)                   (u)
and an index r, 1 ≤ r ≤ k, such that, t1 is a suffix of sr , tj = sr+j , for 2 ≤ j ≤
           (u)                              (u)                             (u)
lu −1 and tlu is a prefix of sr+lu . Since tj = sr+j , it follows that E(tj ) = E(sr+j ).
Further, since there are at most w1 possible non-empty prefixes and w1 possible non-
empty suffixes of any block, the encodings of the prefixes and suffixes of each block
of the text string are also stored along with the chosen set of substrings for the block.
The number of blocks are k = w1 . The number of prefixes and suffixes of a block is
                                    n

2 · w1 . Therefore, the total number of prefix and suffix encodings is
                          n
                             · 2 · w1 ≤ 2 · n + 2 · w1 = O(n) .                            (4)
                          w1
|t|
The number of encodings required for the pattern string is w1 · w1 = O(|t|), and is
therefore, linear in the size of the pattern string. The total number of encodings (space)
required for the text string s is given by the sum of equations (3) and (4), which is
O( n·w1 + n). Suppose that the database string size n is large, and w1 is chosen to be
    w22

say 64. If w2 is chosen to be 4, then, this reflects a substantial improvement over the
   n2                                                                       √
O( w2 ) encodings scheme presented in Section 5. Specifically, if w2 ≥ w1 , then, the
number of text encodings is linear in n. The time complexity of the substring matching
                             √
operation is O(w1 · n · |Σ| w1 ).
                   2


Cryptographic Strength. Suppose that t is a substring of s and there is a match of t that
overlaps multiple blocks of s. Then, in the encrypted string, the approximate position of
t within s can be inferred. Although, if s and t are both unknown to a third party (say,
the data mining outfit), then, this revelation is of no consequence. If t is a substring of
s that is completely contained in a single block of s, then, no positional information is
revealed. Another property of the hierarchical scheme is that all matches of t with s can
be found. That is, if t occurs many times in s, then, all occurrences of t can be found
using the data structure. Once again, if this inference is being done by a third party,
which is not privy to either s or t, then, no information is revealed.

7 Matching Patterns specified as Regular Expressions
In this section, we present a scheme for finding all occurrences of a pattern specified as a
regular expression against a given encrypted database string s. We first discuss a simple
principle of privacy preserving computations, and, then, state the problem that arises in
the context of privacy preserving finite automaton computations and then present one
possible solution to it.

7.1 A principle of privacy preserving computations
Privacy cannot be preserved by computations that encrypt or decrypt using symmetric
keys. Further, privacy is not preserved if text is either encrypted or decrypted within
the program. The principle states the following. Suppose that there is a privacy preserv-
ing computation being carried out within a third party, and the computation applies a
symmetric key encryption (or decryption) function to some text. We can assume that
the key is available to the program and therefore to the third party (using program anal-
ysis). Further, we assume that the encryption algorithm is also known. Therefore, the
third party can both encrypt and decrypt the text or cipertext, as the case may be. The
second statement of the principle is more general; clearly, if a string is encrypted or
decrypted, then the string is revealed. A consequence of the above discussion is that the
all privacy preserving computations must work on ciphertext.

7.2 Privacy preservation of state transition computations
Let D be a deterministic finite automaton that accepts a given regular expression. We
note that there is a basic problem in preserving the privacy of a string accepted by a
finite automaton.
Consider the state transition function of the given DFA D. We can assume that the
set of states of the automaton is known to the data mining party (or can be inferred
from the code available). Let t be the string seen so far by the DFA D, and let q be
the current state of D. The next transition reveals the letter that extends t and the next
state of D. By keeping track of the matching letters in this manner, the matching string
can be known in entirety. Therefore, the most that can be assumed is that the letters
of the alphabet Σ are encrypted. This is equivalent to assuming that the alphabet Σ
has been permuted, using a permutation that is not known to the data mining party.
However, using statistical analysis, the data mining party can gain some information,
and therefore can make a guess for the pattern string whose probability of being correct
is greater than that of a random guess. Note that the problem is independent of the
mechanism used for encrypting the database string s.


7.3 A weak solution

The problem outlined in Section 7.2 is inherent to finite automaton computations.
     A DFA extends the prefix of a matching string by a single letter to obtain a longer
prefix. Thus, the unit of encryption possible are the letters of the alphabet Σ , making
it susceptible to statistical attacks. A simple extension is to transform the given DFA D
into another machine D (D is a finite state automaton with a little extra power) such
that, D makes its transformations on the strings of Σ w , where, w is a parameter. The
final suffix of any string of the regular language whose size is not a multiple of w uses
transitions from the enlarged alphabet ∪w−1 Σ w . The value of w is not known to the
                                            w =1
data mining party. We choose a permutation π that maps Σ w to Σ w , and ensure that
D uses the transformed alphabet π(Σ w ). The values of w and π are withheld from the
data mining party.
     The scheme is better than the previous scheme, since statistical attacks would re-
quire knowledge of statistics of strings of size w, for an unknown (though small) value
of w. This reduces the effectiveness of statistical attacks. This approach can be strength-
ened slightly, as demonstrated below by means of an example.


7.4 A slightly better solution

Consider the alphabet Σ = {a1 , a2 , . . . , ak }. Suppose we choose powers of 2, namely,
2, 22 , . . . , 2v , where, v is a parameter, v ≤ k. Partition the alphabet into w subsets, Σ i ,
1 ≤ i ≤ v, such that the subsets are pair-wise disjoint. Define the following sets.
                                            i
                                       2
                                 Λi = Σi , for 1 ≤ i ≤ v.

That is, the set Λi is the set of strings of size 2i that is constructed from letters from Σi .
For 1 ≤ i ≤ v − 1, define the following set.

∆i = {σ | |σ| = 2i and σ ∈ (∪v Σj )∗ and ∃ a, b ∈ σ such that a ∈ Σi and b ∈ (∪v
                             j=i
                                                                                         ∗
                                                                               j=i+1 Σj ) }

The set ∆i is the cross set between Σi and the partitions with indices higher than j.
That is, it is the set of all strings of size 2i over the letters of partitions with indices i or
i
above (i.e., ∆i ⊂ (∪v Σj )2 ). Further, all strings in ∆i are constrained to contained
                     j=i
at least one occurrence from the sets Σi and ∪v j=i+1 Σj (i.e., it is a cross string).
     Let w = 2v . As a consequence of the construction, the sets ∪v Λi and the sets
                                                                         i=1
  v−1
∪i=1 ∆i can generate Σ w uniquely. That is, each string of Σ w can be uniquely repre-
sented as the concatenation of strings in the Λ and ∆ sets (prove!).
     The transformed and equivalent automaton D can be constructed so that it uses the
following alphabet.
                                                 v−1
                  Σ = alphabet of D = E(∪v Λi ∪ ∪i=1 ∆i )
                                         i=1

That is, members of Σ are encrypted members of Λi ’s and ∆i ’s. Accordingly, the
database string s is encrypted accordingly.

Cryptographic strength. The privacy of the scheme draws from the fact that the parti-
tion of the original alphabet into the subsets is not known to the third party. Further, the
value of v is also withheld from the third party. These two reasons reduce the effective-
ness of statistical attacks. However, some information about the length of the string can
be deduced.


8 Performing Aggregate Operations on Encrypted Numeric
  Values

In this section, we propose a scheme for performing aggregate operations like SUM,
AVG or COUNT on encrypted numeric data. A lot of work has been done in the past
in the field of security and mobile computing for performing arithmetic operations over
encrypted numbers. These are based on encryption transformations called Privacy Ho-
momorphisms (refer [5],[6],[7],[8],[9]). However, these techniques are not applicable to
databases. Many of these are limited to operations on two numbers and some of these
require message passing for each operation(refer [13]). Recent works related to perform-
ing query over encrypted data allow comparison operations like GREATER THAN,
LESS THAN, MAX or MIN (refer [4]).
    The scheme proposed by us is directly applicable to the databases and is robust
against brute force or cipher-text analysis attack.


8.1 Scheme 1

Our scheme is based on the fact that we can use a function, say f , to encrypt a numeric
value d such that f (x, y, ..) = d. The solution < x, y, .. > can be used as an encrypted
value for d. The properties which a function f should satisfy are as follows-

 1. The function f should have many solutions equation f (x, y, ..) = d where d can
    be any numeric value in the range of data being encrypted i.e. f is many-to-one
    function.
 2. The function f should have atleast two arguments.
 3. It should be possible to evaluate Σf (xi , yi , ...) given the values of Σxi , Σyi , ...
Let us consider a function f (x, y) = kx+y. This function satisfies all the properties
stated above. Hence, we can encrypt any data d as < x, y > such that kx + y = d.
Note that, it is possible to have many solutions for the same data d. Let us assume
that we have to add n numeric values d1 , d2 , ..., dn and we have the encrypted values
< x1 , y1 >, < x2 , y2 >, ..., < xn , yn >. Given the encrypted values, we can calculate
Σi=1 xi and Σi=1 yi . Obtaining these values, we can evaluate Σi=1 di as it is given by
  n             n                                                    n

Σi=1 di = kΣi=1 xi + Σi=1 yi . So, we see that we can perform the addition directly
  n             n           n

over the encrypted data. Based on the knowledge of xi and yi , it is not possible to get
di without the knowledge of k. It is also possible to have function having n arguments
where n > 2.
    This simple idea has a drawback that it discloses the data distribution. For above
example, encrypted values corresponding to same data d will lie on the same line with
slope k. Identification of any one line will give the value of k. Drawing lines through
other points and parallel to line corresponding to d will result in distribution knowledge
based on the distance between lines. This is the major limitation which needs to be
eliminated.

8.2 Improving the Scheme
Based on the above idea, we find a function f which satisfies the properties stated
earlier. Let us take f (x, y) = kx + y. Now, we select two prime numbers p and q. Then,
we find the solution < x, y > for f (x, y) = d where d is numeric data being encrypted.
Now, instead of encrypting d as < x, y >, we encrypt d as < a, b > such that a mod
p = x and b mod q = y. Note that the encrypted set consisting of all < a, b > for
f (amod p, b mod q) = d is spread over the entire space and not just on a line. And it is
not possible to guess the value of d without knowing prime numbers p and q.
    But the above scheme is prone to attack. As p is a large prime number and x1 and
x2 are mapped to mp + x1 and np + x2 respectively so one can get approximate value
of m/n by calculating (mp + x1)/(np + x2) and subsequently m and n as m, n are
integers. Now one can guess p very closely. Using same approach q can also be guessed.
Knowing p and q one will retrieve all < x, y > tuple.

8.3 Scheme 2
Let us suppose, the data value d to be protected can be represented as N -bits long binary
number. Identify the set of Linear Transformations L. Now, do as follows-
 1. Divide the N -bits long binary number into m consecutive unequal parts.
 2. Apply linear transformations from the set L to each of the m parts so as to map
    each part to a binary number of fixed length l.
 3. Now, re-order these linearly operated parts and save these re-ordered parts in database.
    The order for shuffling these parts will be fixed.
The data encrypted in above format can be used for performing addition directly without
decryption. To get the sum of any n numbers, we need to do the addition of entries
corresponding to these n numbers for each of the m columns. This gives the result in
encrypted form. The result can be decrypted as follows-
1. As the re-ordering sequence is known to us, we re-order the the m-parts of result in
    original sequence.
 2. As each part was linearly transformed, so the addition of linearly transformed value
    corresponds to linear transformation of addition of original values i.e. h(Σ i=1 xi ) =
                                                                                 n

    Σi=1 h(xi ). This linear transformation h is known to us and we know h(Σi=1 xi ),
      n                                                                            n

    we can use the inverse function h to get Σi=1 xi .
                                      −1          n

 3. Get the binary representation of Σi=1 xi for each of the m columns. The resulting
                                       n

    m parts can be used to get the final result.

The strength of above scheme is based on the fact that the way N -bits number is bro-
ken into m unequal parts, is not known to the third party performing computation on
encrypted data. Total number of ways in which N -bits number can be broken into m
consecutive unequal parts is N −1 Cm−1 − 1. For N =128 and m=10, number of ways
for this are O(1014 ). Also, the number of ways m parts can be reordered is m! which is
O(106 ) for m=10.So, to break the encryption, the intruder must be able to guess the lin-
ear transformations corresponding to each part and then check O(10 20 ) cases in worst
case.


8.4 Final Scheme

As we saw, Scheme 1 is not strong enough to protect the data itself but it hides the
distribution very well.On the other hand, Scheme 2 is highly protective but it does not
hide the data distribution. Hence, we came up with a final scheme which is fusion of
Scheme 1 and Scheme 2.
    Compute < x, y > and calculate < a, b > tuple using scheme1 and protect a and b
using scheme 2. In this manner, we not only protect the data but also the distribution.


9 Formal aspects of privacy preserving computations

In this section, we formally define privacy preserving computations and partial privacy
preserving computations.

Definition 7. A language L ⊆ Σ ∗ ×Σ ∗ is said to be computable in a privacy-preserving
fashion if there exist computable functions f and g and a Turing machine M such that,
(x, y) ∈ L if and only if (f (x), g(y)) ∈ L(M ) and E NTROPY((x, y) | (f (x), g(y))) =
E NTROPY((x, y)).

The intended meaning of the string (x, y) of L is that x is a database string and y is an
encoding of the property being checked. We present several examples later on to clarify
the above definition. In general, the calculation of the conditional entropy function may
prove to be difficult. In order to extend the scope of the definition, we also introduce
the notion of partially privacy preserving computations and the index of privacy preser-
vation. In this section, the letter M , together with subscripts and superscripts, typically
denote Turing Machines (TMs).
Definition 8. A language L ⊆ Σ ∗ ×Σ ∗ is said to be computable in a partially privacy-
preserving fashion if there exist computable functions f and g and a Turing machine
M such that, (x, y) ∈ L if and only if (f (x), g(y)) ∈ L(M ) and E NTROPY ((x, y) |
(f (x), g(y))) ≤ E NTROPY ((x, y)). The index of privacy preservation is defined as
E NTROPY ((x,y)|(f (x),g(y)))
      E NTROPY ((x,y))        .

The formal definitions are intended for comparing competitive privacy preserving schemes.
We now present examples of languages that can be computed while partially preserving
privacy.

Example 9. Let L = {(x, y) | y is a substring of x}. Then, f (x) can be the PPES
encoding of x, and g(y) could be the function mapping strings to primes, using the
same mapping function used by f . Here, the TM M checks whether g(y)|f (x).

Example 10. Let L be any language that satisfies the following property: |{y | (x, y) ∈
L}| is finite. Let L1 denote the language {x | ∃y such that (x, y) ∈ L}. For every
x ∈ Σ ∗ , f (x) is the PPES encoding of {y | (x, y) ∈ L}. Since this set is finite for all
x ∈ Sigma∗ , the PPES encoding is well-defined. The function g(y) is the mapping of
y to a prime using the same mapping function used by f . The mapping described is,
in general, partially privacy preserving, since, for every x, such that x ∈ Σ ∗ − L1 , the
PPES encoding f (x) corresponds to the empty string. This reveals some information
about the database strings.

The complexity of a partially privacy preserving mapping is parameterized by several
measures. The database size measure is given by |f (x)| as a function of |x|. This is a
measure of the expansion in the size of the database strings. The input size measure is
given by |g(y)| as a function of the input size |y|. The time and space complexity of the
input transformations f and g form yet another set of relevant complexity measures.
The time and space complexity of deciding the transformed language {(f (x), g(y)) |
(x, y) ∈ Σ ∗ } yields the final measure of complexity.

Example 11. The universal language U = {(x, M ) | x ∈ L(M )}, where, M repre-
sents the encoding of the TM M . Consider the alphabet scrambling scheme discussed
in Section 7.4. Let g( M ) = M , where, M is the TM that is isomorphic to M ,
except that the alphabet used is the transformed alphabet Σ , corresponding to the spe-
cific bits used by the alphabet scrambling scheme. Let f (x) be the encoding of x in the
alphabet Σ . Let M be the universal TM over the alphabet Σ .
    The reason for calling this language as the universal language for privacy preserva-
tion is that if L can be computed in a privacy preserving fashion, then, all recursive sets
can be computed in a privacy preserving fashion. This can be argued as follows. Let L
be a language that is computable in a privacy preserving fashion. Then, there exists com-
putable functions, f1 , g1 and a TM R such that (x, y) ∈ L iff (f1 (x), g1 (y) ∈ L(R).
For a given y ∈ Σ ∗ , let < N (y) > be the encoding of a Turing machine N (y) that
works as follows. N (y) takes an input u and runs R on the pair (f1 (u), g1 (y)), ac-
cepting iff R accepts. Consider the pair (x, < N (y) >). Clearly, x ∈ L(N (y)) iff
(f1 (x), g1 (y) ∈ L(R).
10 Conclusions

With increasing concerns for privacy and safety of data, it has become necessary to
develop techniques for the preservation of data being collected in databases. As it is
possible for an intruder to get access to raw database files, there is a threat of informa-
tion leakage or exposure.Encryption techniques come as a rescuing technique for this
problem but it renders the encrypted data useless for answering any SQL queries.In this
paper, we have presented schemes which can perform SQL queries directly over the
encrypted data. The VDES scheme suggested in this paper can be used for protecting
variable length string data and answering pattern matching queries. This scheme can be
used in the domain where program running to answer the query is assumed to be safe.
Another scheme, for string pattern related queries, called Extension of PPES given in
Section 5 can be used for complete security. This can also be used for third party com-
putation without any revelation. We also provide a scheme in Section 8 that preserves
the result of SUM and AVERAGE operations of numerical data types. We see that if
above schemes are incorporated with the OPES scheme presented in [4], the encrypted
database will be able to answer the following queries: Exact String Matching, Finding
Substrings in given set of Strings, Pattern matching in strings (LIKE operations) and
queries related to numerical data type like SUM,AVG,COUNT,MAX,MIN,GROUP BY
and ORDER BY.


11 Future Work

In future, the work done in this paper can be extended to find a better technique for
regular expression matching. Also, the techniques presented by us for answering SQL
queries come with a cost in terms of higher secondary memory/storage requirements.
Although secondary memory is getting cheaper, reduction in its usage can have a bet-
ter performance impact on implementation. Also, the scheme described in Section 8
for preserving aggregate operations on numerical data types can be extended to answer
MAX or MIN queries.
Acknowledgments . We would like to thank Dr. Sumit Ganguly for his constant guid-
ance, motivation and for being a source of inspiration for us.


References
 1. Ozsoyoglu, Singer. Anti-tamper databases :querying encrypted databases.In Proc. Of 17th
    annual IFIP WG11.3 Working conference on Database and Application Security, Colorado,
    August 2003.
 2. Song, Wagner and A.Perrig. Practical techniques for searches on encrypted data.In IEE
    Symp. on security and privacy, Oakland, California, 2000
 3. H.Hacigumus, B.R.Iyer, C.Li and S.Mehrotra. Executing SQL over encrypted database-
    service-provider model.In Proc. Of ACM SIGMOD Conf on Mamagement of Data, Madison,
    Wisconsin, June 2002.
 4. R.Agrawal, J.Kiernan, R.Srikant and Y.Xu. Order Preserving Encryption for Numeric
    Data.In SIGMOD 2004, Paris, France, June 2004.
5. N.Ahituv, Y.Lapid and Neumann. Processing encrypted data.Communications of
    ACM30(9):777-780, 1987.
 6. J. Domingo-Ferrer and J.Herrera-Joancomarati. A privacy homomorphism allowing field op-
    erations on encrypted data.Journes de Matematica Discreta i Algorismica, Universitat Po-
    litecnica de Catalunya, March 1998.
 7. J.Domingo-Ferrer. A new Privacy homomorphism and applications.Information Processing
    Letters, 60(5):277-282, 1996.
 8. Feigenbaum, Liberman and R.N.Wright. Cryptographic protection of databases and soft-
    ware.In Proc. of DIMACS Workshop on Distributed Computing and Cryptography, 1990.
 9. R.L.Rivest, L.Adelman and M.Dertouzos. On data banks and privacy homomorphisms.In
    Foundations of secure computation, 169-178, 1978.
10. R.Agrawal, D.Asonov and R.Srikant. Enabling sovereign informatrion sharing using web
    service.In SIGMOD 2004,Paris,France,June 2004.
11. R.Agrawal, Bayardo, Kiernan, Faloustsos, Rantzau and R.Srikant. Auditing Compliance
    With a Hippocratic Database.In Proceedings of VLDB Conference, Toronto, Canada, 2004
12. R.Agrawal and R.Srikant. In Proc. Of ACM SIGMOD Conference on Management of Data,
    2000.
13. I-Ling Yen,Wei Li,Qingkai Ma and Farokh Bastani.Secure Computation with Low Overhead.
    In University of Texas at Dallas,Richardson
14. Tomas Sanderand Christian F. Tschudin. Towards Mobile Cryptography. In Proceedings of
    the IEEE Symposium on Security and Privacy
15. Bijit Hore, Sharad Mehrotra and Gene Tsudik.A Privacy-Presrving Index for Range Queries.
    In Proceedings of 30th VLDB Conference, Toronto,Canada, 2004.
Appendix

       Privacy of the SUM operator
       In this section, we present a solution to the following problem. Given a column of values
       d1 , d2 , . . . , dn , compute the sum d1 + d2 + . . . + dn in a privacy preserving fashion.
       Assume that M is an upper bound on this sum. This section should be studied with
       Section 8 of this paper.
            Fix an element d = di occurring in the column. Let d be partitioned as the sum
       d = x + y of two elements x and y, where, x < U and y < U for some upper bound
       U . Choose two distinct numbers a and b, each larger than M · U , such that a and b are
       both prime numbers. 1 Further, without loss of generality, assume that a > b.
            Consider the group G = {0, 1, 2, . . . , a − 1} under the operation of + mod a.
       Consider the group H = (b), that is, the group generated by b, that is, the set H =
       {b, 2b, 3b, . . .}. H is a subgroup of G. By Lagrange’s theorem, |H|||G|. Let |H| = h.
       Since H is a subgroup of G, it contains the 0 element. Let the zero element be kb,
       where, 1 ≤ k ≤ h ≤ a. Thus, kb|a. Since, (a, b) = 1, therefore, k|a. Since a is prime,
       k|a and k ≤ a iff k = a. This implies that h = |G| and therefore, H = G. Thus, there
       exists a unique index k1 such that k1 b = 1 mod a. Let a = a mod b. In a similar
       way, it can be argued that there exists a unique integer k2 such that k2 a = 1 mod b.
            We can now design the encoding scheme as follows. Corresponding to a given inte-
       ger d, construct the following integer.
           E NCODING(d) = k1 · b · x + k2 · a · y, where, x + y = d, x < U, and y < U .
       Given values d1 , d2 , . . . , dn , each of the values is first transformed into di = xi + yi ,
       using private bits. Subsequently, we form E NCODING(di ), for i = 1, 2, . . . , n. The sum
       is obtained by taking the sum of the encodings. That is,
                            n             n                                 n                     n
            E NCODING(           di ) =         E NCODING(di ) = k1 · b ·         xi + k2 · a ·         yi .
                           i=1            i=1                               i=1                   i=1

       The decoding operation is as follows.
                 x = E NCODING(d)             mod a      and     y = E NCODING(d)           mod b
       The correctness of the decoding operation can be shown provided, each of a and b are
       at least M . Since, k1 · b = 1 mod a and x < M ≤ a, therefore,
                    E NCODING(d)          mod a = k1 · b · x mod a = x mod a = x.
       Similarly, it can be shown that E NCODING(d) mod b = y. Now x + y can be added
       to obtain d.




1
    the notation (a, b) stands for the GCD of a and b.

More Related Content

What's hot

Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPapitha Velumani
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPapitha Velumani
 
Towards secure multi keyword top-k retrieval over encrypted cloud data
Towards secure multi keyword top-k retrieval over encrypted cloud dataTowards secure multi keyword top-k retrieval over encrypted cloud data
Towards secure multi keyword top-k retrieval over encrypted cloud dataJPINFOTECH JAYAPRAKASH
 
The Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion TheoryThe Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion Theoryinventionjournals
 
M021201092098
M021201092098M021201092098
M021201092098theijes
 
Privacy preserving multi-keyword ranked search over encrypted cloud data 2
Privacy preserving multi-keyword ranked search over encrypted cloud data 2Privacy preserving multi-keyword ranked search over encrypted cloud data 2
Privacy preserving multi-keyword ranked search over encrypted cloud data 2Swathi Rampur
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataIGEEKS TECHNOLOGIES
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataNexgen Technology
 
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
 A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD... A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...Nexgen Technology
 
IRJET- Application of Machine Learning for Data Security
IRJET-  	  Application of Machine Learning for Data SecurityIRJET-  	  Application of Machine Learning for Data Security
IRJET- Application of Machine Learning for Data SecurityIRJET Journal
 
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
Secure Privacy Preserving Using Multilevel Trust For Cloud StorageSecure Privacy Preserving Using Multilevel Trust For Cloud Storage
Secure Privacy Preserving Using Multilevel Trust For Cloud StorageIRJET Journal
 
A Crypto-System with Embedded Error Control for Secure and Reliable Communica...
A Crypto-System with Embedded Error Control for Secure and Reliable Communica...A Crypto-System with Embedded Error Control for Secure and Reliable Communica...
A Crypto-System with Embedded Error Control for Secure and Reliable Communica...CSCJournals
 
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...IRJET Journal
 
privacy preserving multi keyword ranked search over encrypted cloud data
privacy preserving multi keyword ranked search over encrypted cloud dataprivacy preserving multi keyword ranked search over encrypted cloud data
privacy preserving multi keyword ranked search over encrypted cloud dataswathi78
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPapitha Velumani
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataShakas Technologies
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...IRJET Journal
 
A secure and dynamic multi keyword ranked
A secure and dynamic multi keyword rankedA secure and dynamic multi keyword ranked
A secure and dynamic multi keyword rankedjpstudcorner
 

What's hot (20)

Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Towards secure multi keyword top-k retrieval over encrypted cloud data
Towards secure multi keyword top-k retrieval over encrypted cloud dataTowards secure multi keyword top-k retrieval over encrypted cloud data
Towards secure multi keyword top-k retrieval over encrypted cloud data
 
The Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion TheoryThe Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion Theory
 
M021201092098
M021201092098M021201092098
M021201092098
 
Privacy preserving multi-keyword ranked search over encrypted cloud data 2
Privacy preserving multi-keyword ranked search over encrypted cloud data 2Privacy preserving multi-keyword ranked search over encrypted cloud data 2
Privacy preserving multi-keyword ranked search over encrypted cloud data 2
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
 A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD... A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
 
IRJET- Application of Machine Learning for Data Security
IRJET-  	  Application of Machine Learning for Data SecurityIRJET-  	  Application of Machine Learning for Data Security
IRJET- Application of Machine Learning for Data Security
 
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
Secure Privacy Preserving Using Multilevel Trust For Cloud StorageSecure Privacy Preserving Using Multilevel Trust For Cloud Storage
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
 
A Crypto-System with Embedded Error Control for Secure and Reliable Communica...
A Crypto-System with Embedded Error Control for Secure and Reliable Communica...A Crypto-System with Embedded Error Control for Secure and Reliable Communica...
A Crypto-System with Embedded Error Control for Secure and Reliable Communica...
 
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
Hybrid Approach for Improving Data Security and Size Reduction in Image Stega...
 
privacy preserving multi keyword ranked search over encrypted cloud data
privacy preserving multi keyword ranked search over encrypted cloud dataprivacy preserving multi keyword ranked search over encrypted cloud data
privacy preserving multi keyword ranked search over encrypted cloud data
 
Final 1st
Final 1stFinal 1st
Final 1st
 
Ijcnc050208
Ijcnc050208Ijcnc050208
Ijcnc050208
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
 
A secure and dynamic multi keyword ranked
A secure and dynamic multi keyword rankedA secure and dynamic multi keyword ranked
A secure and dynamic multi keyword ranked
 

Viewers also liked

Facing Death With A Smile
Facing Death With A SmileFacing Death With A Smile
Facing Death With A SmileHassan Rizwan
 
The CHAMPION Presenter
The CHAMPION PresenterThe CHAMPION Presenter
The CHAMPION PresenterHassan Rizwan
 
Z - "Noël-Xmas-Navidad" pps in 3 languages when downloaded
Z - "Noël-Xmas-Navidad" pps in 3 languages when downloadedZ - "Noël-Xmas-Navidad" pps in 3 languages when downloaded
Z - "Noël-Xmas-Navidad" pps in 3 languages when downloadedPatrick Pawlowski
 
Achieving The Impossible Presentation
Achieving The Impossible   PresentationAchieving The Impossible   Presentation
Achieving The Impossible PresentationHassan Rizwan
 

Viewers also liked (8)

Facing Death With A Smile
Facing Death With A SmileFacing Death With A Smile
Facing Death With A Smile
 
The CHAMPION Presenter
The CHAMPION PresenterThe CHAMPION Presenter
The CHAMPION Presenter
 
Z - "Noël-Xmas-Navidad" pps in 3 languages when downloaded
Z - "Noël-Xmas-Navidad" pps in 3 languages when downloadedZ - "Noël-Xmas-Navidad" pps in 3 languages when downloaded
Z - "Noël-Xmas-Navidad" pps in 3 languages when downloaded
 
I M Possible
I M PossibleI M Possible
I M Possible
 
Achieving The Impossible Presentation
Achieving The Impossible   PresentationAchieving The Impossible   Presentation
Achieving The Impossible Presentation
 
Afinitor manejo dr trujillo
Afinitor manejo dr trujilloAfinitor manejo dr trujillo
Afinitor manejo dr trujillo
 
90 10 Principle
90 10 Principle90 10 Principle
90 10 Principle
 
Nothing is impossible
Nothing is impossibleNothing is impossible
Nothing is impossible
 

Similar to The Champion Supervisor

Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netredpel dot com
 
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET Journal
 
Iaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data setsIaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data setsIaetsd Iaetsd
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...IEEEGLOBALSOFTTECHNOLOGIES
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search overIEEEFINALSEMSTUDENTPROJECTS
 
Secure_Data_Distribution_Algorithm_for_Fog_Computing.pdf
Secure_Data_Distribution_Algorithm_for_Fog_Computing.pdfSecure_Data_Distribution_Algorithm_for_Fog_Computing.pdf
Secure_Data_Distribution_Algorithm_for_Fog_Computing.pdfHimaBinduKrovvidi
 
Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for InsuranceNIIT Technologies
 
Detecting Password brute force attack and Protecting the cloud data with AES ...
Detecting Password brute force attack and Protecting the cloud data with AES ...Detecting Password brute force attack and Protecting the cloud data with AES ...
Detecting Password brute force attack and Protecting the cloud data with AES ...IRJET Journal
 
AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...
AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...
AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...IRJET Journal
 
A privacy leakage upper bound constraint based approach for cost-effective pr...
A privacy leakage upper bound constraint based approach for cost-effective pr...A privacy leakage upper bound constraint based approach for cost-effective pr...
A privacy leakage upper bound constraint based approach for cost-effective pr...JPINFOTECH JAYAPRAKASH
 
IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...
IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...
IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...IRJET Journal
 
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
IRJET-  	  Secure Data Deduplication and Auditing for Cloud Data StorageIRJET-  	  Secure Data Deduplication and Auditing for Cloud Data Storage
IRJET- Secure Data Deduplication and Auditing for Cloud Data StorageIRJET Journal
 
F018133640.key aggregate paper
F018133640.key aggregate paperF018133640.key aggregate paper
F018133640.key aggregate paperIOSR Journals
 
IRJET - Multi Authority based Integrity Auditing and Proof of Storage wit...
IRJET -  	  Multi Authority based Integrity Auditing and Proof of Storage wit...IRJET -  	  Multi Authority based Integrity Auditing and Proof of Storage wit...
IRJET - Multi Authority based Integrity Auditing and Proof of Storage wit...IRJET Journal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...dbpublications
 
Implementation of De-Duplication Algorithm
Implementation of De-Duplication AlgorithmImplementation of De-Duplication Algorithm
Implementation of De-Duplication AlgorithmIRJET Journal
 
Distributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud ComputingDistributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud ComputingAIRCC Publishing Corporation
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGAIRCC Publishing Corporation
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGijcsit
 

Similar to The Champion Supervisor (20)

Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot net
 
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
 
Iaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data setsIaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data sets
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
 
Secure_Data_Distribution_Algorithm_for_Fog_Computing.pdf
Secure_Data_Distribution_Algorithm_for_Fog_Computing.pdfSecure_Data_Distribution_Algorithm_for_Fog_Computing.pdf
Secure_Data_Distribution_Algorithm_for_Fog_Computing.pdf
 
Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for Insurance
 
Detecting Password brute force attack and Protecting the cloud data with AES ...
Detecting Password brute force attack and Protecting the cloud data with AES ...Detecting Password brute force attack and Protecting the cloud data with AES ...
Detecting Password brute force attack and Protecting the cloud data with AES ...
 
AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...
AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...
AES-BASED IMAGE ENCRYPTION AND DECRYPTION FOR ROBUST DATA SECURITY AND DEFENS...
 
A privacy leakage upper bound constraint based approach for cost-effective pr...
A privacy leakage upper bound constraint based approach for cost-effective pr...A privacy leakage upper bound constraint based approach for cost-effective pr...
A privacy leakage upper bound constraint based approach for cost-effective pr...
 
IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...
IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...
IRJET- An Implementation of Secured Data Integrity Technique for Cloud Storag...
 
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
IRJET-  	  Secure Data Deduplication and Auditing for Cloud Data StorageIRJET-  	  Secure Data Deduplication and Auditing for Cloud Data Storage
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
 
F018133640.key aggregate paper
F018133640.key aggregate paperF018133640.key aggregate paper
F018133640.key aggregate paper
 
IRJET - Multi Authority based Integrity Auditing and Proof of Storage wit...
IRJET -  	  Multi Authority based Integrity Auditing and Proof of Storage wit...IRJET -  	  Multi Authority based Integrity Auditing and Proof of Storage wit...
IRJET - Multi Authority based Integrity Auditing and Proof of Storage wit...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
 
Implementation of De-Duplication Algorithm
Implementation of De-Duplication AlgorithmImplementation of De-Duplication Algorithm
Implementation of De-Duplication Algorithm
 
Distributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud ComputingDistributed Scheme to Authenticate Data Storage Security in Cloud Computing
Distributed Scheme to Authenticate Data Storage Security in Cloud Computing
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
 

Recently uploaded

Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedKaiNexus
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 

Recently uploaded (20)

Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 

The Champion Supervisor

  • 1. Privacy Preserving Schemes for SQL Operations Abhinav Parate, Bhupendra Singh Indian Institute of Technology, Kanpur Abstract. With the global emergence of concerns for the privacy of huge amount of information being collected in the databases, the research work in the field of protecting databases from exposure or any other attack has increased. Encryption techniques- both conventional and modern have proved out to be good technol- ogy in protecting the sensitive data. However, once encrypted, data can no longer be queried for operations like Greater Than, Less Than, Substring matching and others. The only operation that could be done on encrypted data is to find ex- act match. Hence, performing query over encrypted data is associated with the overhead of decrypting the entire encrypted data and then performing the oper- ations. Here we present two schemes which can perform operations of substring matching directly over the encrypted string data. We also present schemes for performing aggregate operations like SUM and AVG over encrypted numerical data.The advantage of the scheme includes no overhead of decryting the entire data.The encryption scheme is robust and well protective. 1 INTRODUCTION Present day Database Systems offer the protection of database from attack through the means of access control which restricts the access to sensitive data. The access control mechanism protects the privacy of sensitive data from intrusion through the database system interfaces. The basic assumption is that the database is accessed through database system interfaces. However, it is important to have such protection but this can prove out to be insufficient. It is because the raw database files remain the part of operating system and hence, the attack on computer systems may lead to privacy breach if some- one gets the access to raw database files. The access to these files cannot be prevented by the access control mechanisms. Encryption Techniques which have been proved out to be efficient in protecting the sensitive information from being revealed easily.However, the present techniques of encryption were not designed with the goal of protecting databases which can be very huge in size. As a result, incorporating the present encryption schemes directly involves a huge overhead of decrypting the entire encrypted data before performing any kind of operation on them. We present here two encryption techniques for encrypting the data of type STRING.These techniques have been designed while taking care of the requirements of performing the operations related to strings directly over the encrypted data. The SQL queries involves the following operations- – String Matching: The result of the operation includes all those strings which are equal to the given string as parameter
  • 2. – LIKE operation %abc%: This operation results in all those strings which have ’abc’ as substring – LIKE operation a b : This operation results in strings of the form ’axb’ where x is any character – Pattern Matching:Mix of above two operations to result in strings matching some pattern Our first encryption scheme is ’VDES’ i.e. Varying Distribution Encrytion Scheme. The idea behind this scheme is to completely change the distribution pattern of the characters present in the data. The reason being that there are some characters which occur much more than other characters. As a result, if someone knows the original distribution of the character, he may guess the encryption scheme by looking over the distribution of encrypted characters. This scheme has been designed specifically for the domain where program being used to answer the queries, is assumed to be protected. The encrypted database itself is useless if the program to answer query is not known or available. The advantages of VDES over other encryption schemes are- – Operations It supports all the operations mentioned above and a very good encryp- tion – No false positives The results of the query over encrypted results only in true values and no false positives – It makes database useless if the VDES program is not available – It handles updates very easily – We can change the distribution of characters very easily with updates However,the application of the VDES is limited to incorporating it directly with database which should know the unencrypted pattern i.e. it works in semi-encrypted domain.It is not possible with this scheme to give the pattern itself in encrypted form. To overcome this limitation, we have another scheme which works entirely in encrypted domain. This scheme we call as PPES or Product of Primes Encryption Scheme. The advantage of PPES over VDES is that it can work in distributed domain where key to decrypt is with client and it may not be necessarily present with the databases having the encrypted data. The client may send the pattern itself in encrypted form and databases will perform the pattern matching operations and return the result having encrypted data. This encrypted data can then be decrypted by the client using key. However, this scheme can be used in place of VDES with databases. The above mentioned schemes are for STRING data type. However, there are also numeric data types in databases on which various operations like finding SUM or AVG are more common. For example, a firm may want to calculate the average salary of its employees or an accounting firm may want to calculate the sum total of the data it has. Due to privacy considerations, such operations should be performed over encrypted data and it should be answered in encrypted form. For such scenario, we present a scheme for performing aggregate operations like SUM, AVG or COUNT over encrypted numeri- cal values. This scheme, like PPES, can be used to answer queries in encrypted and distributed domain. This technique has got application in mobile computing and cryp- tography where it is necessary to perform operations on secure encrypted data without
  • 3. decryption. It is also important to keep the interaction to be minimal due to security concerns. 1.1 Encryption of Databases The encryption of database is possible in two ways as follows- i.Encrypt all the files of database. ii.Encrypt the data values and store them in tables. As it must have been clear from the context, we will be encrypting data values and not the files. 1.2 Report Layout The rest of our report is organized as follows. We will first discuss the related work in section 2 and describe in brief the various approaches till now. We will give the details of VDES in section 3 followed by the details of step by step approach for PPES in sections 4,5 and 6. In section 7, we describe the scheme for finding all occurences of a pattern specified as regular expression against a given database string.We will then discuss our scheme for performing aggregate operations for numerical data types. Finally, we give the formal aspects of privacy-preserving computation followed by conclusion and directions for future work. 2 RELATED WORK The problem we are discussing is a relatively new problem and hence, there is no work related to performing pattern matching over encrypted data. However, the similar prob- lem has been dealt in the field of numerical data as opposed to the string data.The ideas applied in encrypting numerical data have been helpful in understanding the various problems that can arise out of encryption in string data. The technique described in [1] includes strictly increasing polynomial functions to encrypt integer values so that the order of input is preserved even in encrypted values. Hence, the operations like Greater Than, Less Than,Equal to, can be very easily per- formed on the encrypted data. Following that, the resulting encrypted values can be very easily decrypted. This reduced the overhead of decrypting entire database.However, this scheme had the drawback that it revealed the distribution of input data which could be exploited using probabilistic techniques to estimate the values in some interval with some confidence level. In [2], there are some schemes to support keyword searches over encrypted text in emails.But these were not meant and suited for relational queries and databases. In [3], for the first time, there was talk about executing SQL over Encrypted data. This model was for numerical data and it required many interaction between client and server to get the results.Moreover, it resulted in the false positives whose post process- ing involved considerable overhead. In [4], a scheme is proposed for performing operations related to order of input nu- merical data. The scheme proposed had the advantage that the distribution of encrypted
  • 4. data is totally independent of the distribution of input data. This scheme is robust against computer systems attack and is well compatible with the databases. This scheme can be extended for lexicographical ordering of strings but other pattern matching operations cannot be supported. This scheme can be used to answer queries like MAX,MIN and GROUP BY. But this cannot answer the queries like SUM or AVG. The model being used by us for PPES is similar to the model described in this paper. In [13], techniques have been proposed for performing arithmetic operations like addition and multiplication over encrypted data. These techniques are not applicable to database as it cannot be used for addition of more than two numbers. It requires message passing for each operation being performed. In[15],there is a data partitioning technique to buid privacy-preserving indices on sensitive attributes of relational table. In other papers [5][6][7][8] and [9], some encryption transformations have been discussed which allow direct computation on encrypted data.Such encryption transfor- mations are called Privacy Homomorphisms. These papers presented the privacy homo- morphisms for performing addition,multiplication and multiplicative inverse computa- tion on encrypted data.However, these schemes handled very small subset of input data and had no application in databases. In [14], there is a technique for secure computation in mobile cryptography. 3 VDES:VARYING DISTRIBUTION ENCRYPTION SCHEME As the name suggests, the idea of this scheme is to effect the distribution of characters so that attack based on cipher-text analysis is not possible.To understand the complete idea, let us look at simpler idea and the following distribution of characters in database: character Frequency a 200 b 100 e 300 Let us now look at the following encoding scheme. character Encoding a 1234,3578 b 2598 e 1079,2234,7634 With the above encoding, we can encode the character a as either 1234 or as 3578 with equal probability. As we saw in the frequency table above, frequency of a is 200.Hence, a will be encrypted as 1234 100 times and as 3578 for another 100 times. Similarily, b will be present as 2598 for 100 times. And e will be encrypted as 1079,2234 and 7634 each 100 times. String abe may get encrypted as 357825981079 without giving any idea of what characters are present in string. Encryption of String abe for the second time may give completely different result.
  • 5. However, this simple idea has the drawback of assuming static distribution of char- acters which is not true in many cases.And hence, we need to add further encrypting values in character-encoding table. Also, the decrypting key needs to be updated for variable distribution case. 3.1 Improving the Scheme character Characteristic Prime Prime Multipliers a ap a1 , a2 , a3 , ... b bp b1 , b2 , b3 , ... e ep e1 , e2 , e3 , e4 , ... In this scheme, we have a characteristic prime associated with each character.For char- acter a, we will have ap as the characteristic prime. We also have set of prime multipliers associated with a.When a is to be encrypted, some character is chosen randomly from the given multiplier set and is multiplied with ap . Hence, a may get encrypted as ap a1 or ap a2 or ap a3 and so on.The decryption key for this scheme will be the set of char- acteristic primes Cp . Using Cp , we can check if some encrypted character is equal to a or not by checking its divisibility with ap .Moreover, we can effect the distribution at any instant by changing the set of Prime Multipliers Pm . Any further, updates will be handled by the new set. The advantage we have is that we can change the distribution of the characters as we may want.Another advantage that we have is that we are not required to change the decryption key although the Prime Multipliers set Pm may have got changed. Hence, we can change the set Pm , expand it or have completely new elements in it but it does not effect the already encrypted values in any case and are not required to be updated. 3.2 Performing the operations String Matching The string matching algorithm over encrypted string is similar to normal string matching. In this algorithm,the characters of given string are matched with the characters at respective positions of the second string.In our scheme, if we are checking the equality of character a with encrypted character d, we will get the characteristic prime ap for character a and check that whether encrypted character d is divisible by ap or not. Pattern Matching and Substring matching This operation which checks for the ex- istence of a given pattern in the encrypted string is as easy as matching a string with encrypted string. The pattern string may have % symbol which represents presence of one or more random characters at the postion of %. The pattern string may also have symbol to indicate presence of exactly one character at the position of in the pattern string. If the pattern does not have % or then the problem reduces to finding a string with given pattern as its substring. There are efficient algorithms to deal with problem of substring existence. Those algorithms can be easily applied in our case.
  • 6. Search for patterns like ab%cd in string s can be handled easily by searching for ab followed by searching for presence of cd in the substring in s following ab. Similarily search for patterns like ab cd in string s can be handled by first searching for presence of ab in s and then, skipping the character following b in s.If the character at this position in s matches c and the character following matches d, string s will be given out as the string having pattern ab cd. Normal String Matching Algorithm characterAt(int pos,String s)( return the character present at position pos in the String s ) Stringmatch(String s, String t)( int position = 1 for(position=1, ,position++)( if(characterAt(position,s)==null and characterAt(position,t)!=null) return ”Unequal strings” else if(characterAt(position,s)!=null and characterAt(position,t)==null) return ”Unequal strings” else if(characterAt(position,s)==null and characterAt(position,t)==null) return ”Equal Strings” else if(characterAt(position,s)!= characterAt(position,t)) return ”Unequal Strings” else continue )) 3.3 Correctness of VDES VDES requires the following sets for its operations: Σ: Set containing all the characters in a language Cp : A set containing of characteristic primes of each character belonging to Σ Pm : Set containing a set of Prime numbers.This set must be disjoint to set C p i.e.Cp ∩ Pm = φ Pi : Multiplier Set for the character ci ∈ Σ. Pi ⊆ Pm We have some functions as described below: f :Σ → Cp The function f is one to one and it maps each character ci ∈ Σ to pi ∈ Cp .Hence, if f(ci ) =f(cj ) ⇒ ci = cj g: A function that takes a set of elements as input and return one element taken off randomly from the input set.
  • 7. h:Σ → Pm The function h takes a character ci as input and it returns some element m from its mul- tiplier set Pi . Hence,h(ci ) = g(Pi ) Using above functions, we can encrypt any character ci ∈ Σ as follows- E(ci ) = f (ci ) × h(ci ) Theorem 1 Encryption of character x ∈ Σ as E(x) using VDES results in a unique mapping and decryption of E(x) does not result in any false positive i.e. the result of decryption is correct. Proof. Let us suppose we encrypt character ci ∈ Σ as E(ci ) then E(ci ) = f (ci ) × h(ci ) Assuming that the VDES results in false positive i.e.some character c j identifies for ci .Hence, Based on the VDES, we must have a characteristic prime p j = f (cj ) that divides E(ci ). Since, E(ci ) is product of only two prime numbers hence, if f (cj ) divides E(ci ) then either f (cj ) = f (ci ) or f (cj ) = h(ci ). If f (cj ) = f (ci ), then ci = cj as f is one to one mapping.Hence, it leads to contra- diction to assumption that we had a false positive. If f (cj ) = h(ci ), we know that f (cj ) ∈ Cp and h(ci ) ∈ Pm .Also, Cp and Pm are disjoint.Hence, f (cj ) = h(ci ). Hence, we cannot find any cj ∈ Σ which can act as a false positive for ci .Thus, decryption of any encrypted character will result in correct character and not false char- acters. As we know that, algorithms to find a given substring t in a string sproceed by match- ing character by character.And as Theorem 1 says, if some character is matched then it must be the correct one.So, if each character found is correct and we have matched the pattern,then pattern matching must also be correct as we did it character by charac- ter.Hence,VDES query execution results only in correct results and no false positive.It can be stated in the form of lemma as follows- Lemma 1. Pattern matching using VDES does not result in any false positives. 3.4 Analysis of Cryptographic Attack on VDES As the purpose of encryption scheme is to protect the sensitive data from being exposed, we must analyse the kinds of attack that are possible and how succesfully they can be handled. Let us suppose | Pm | = m and | Σ | = n.Hence, we have n characters in a lan- guage.Also, the number of possible multipliers for each character is m. When we en- crypt a database using VDES, we can use each multiplier along with each characteristic
  • 8. prime to encrypt and result in some random distribution.As a result of using m mul- tipliers for n characters, the encrypted language consists of mn numbers.So, we have encrypted each character with m possible numbers. If someone has the idea of n, then he can calculate m from mn.Now, the attacker knows that each character is mapped to m different numbers.So, if he tries to apply Brute Force analysis method to identify the character and its encryption set then he can do so in following way- Of the mn numbers known to him, he can select m numbers randomly and map it for some character c1 .From the remaining (n-1)m numbers, he can select further m numbers to map it for character c2 .He can continue to do so c3 , c4 ...cn .After getting sets for each character, he can try to check if this sets work correctly. In this brute force way, he would be required to test around nm Cm ×(n−1)m Cm ×(n−2)m Cm ×(n−3)m Cm ... ×2m Cm ×m Cm = (nm)!/(m!)n = (nm)(nm − 1)(nm − 2)....1/(1.2.3....m)n > (n − 1)m (n − 2)m ....1m = ((n − 1)!)m Hence, we see that if we have m>1 then, we can improve the complexity for brute force analysis significantly.And we must have m≥n to take care of the distribution sce- nario. Cipher-Text Analysis Another kind of attack is based on the distribution analysis of cipher-text. By distribu- tion analysis, we mean that in general texts, frequency of some characters may occur much more than other characters.Using this information we can look for the character with maximum frequency and guess the character. As in English Language, character e is used most and hence, it can be easily recognised.After recognizing e, other charac- ters can be guessed using idea of distribution of other characters.For example, if z can occur 0.1 times e occurs, then we can look for the encrypted character whose frequency is roughly 0.1 times the most frequent character.Knowledge of domain can help very much in such cases. As a result, it is very important to protect the distribution of characters.Hence, we came up with the idea of varying distribution. VDES results in a distribution which gives no idea of the original distribution of characters.Hence, it is secure from Cipher- text analysis attack. However, there is another kind of cipher-text analysis where we look for the most frequent digram (two characters in a sequence) in cipher-text.This is also not possible in VDES as the same digram can be mapped in m*m ways.Hence, it is difficult to tell which digram is most frequent. Another way to attack is to identify the set Cp .This set can be obtained only by fac- torising the product terms available to the attacker.This attack can be made very costly if we use very big prime numbers in the set.Hence, factorization becomes difficult.
  • 9. 3.5 Memory Requirement VDES has been made very efficient to protect the sensitive data by using very large prime numbers in Cp andPm . However, this comes at a cost in terms of memory re- quirements. Any prime number p requires lg(p) + 1 bits of memory. Since, we need to save the product terms, memory for a product term is 2lg(p) + 1 where p is the largest prime number in Cp ∪ Pm . Hence, a character which required 8-bits now requires around ( 2lg(p) + 1)-bits. 3.6 Strength of VDES The strength of VDES lies in the fact that it performs the task of pattern matching over encrypted data in an elegant manner and at the same time, the encryption scheme is strong enough against brute force analysis attack or cipher-text analysis attack. Al- though the encryption key may change, the length and the value of decryption key remain fixed. This scheme can be readily integrated with the Database softwares available in the market and can be used for the goals we inteded to achieve. i.To reduce the overhead of decrypting entire database in order to perform a query. ii.To design an encryption scheme which achieves the goal stated above along with the protection of database from cipher-text analysis attack. 4 PPES: PRODUCT OF PRIMES ENCRYPTION SCHEME 4.1 Why introducing PPES when VDES works? As we have seen in the previous section that with VDES, we have been able to achieve the goals we intended to achieve.The use scenario of next scheme called Product Of Primes Encryption Scheme is little bit different from the scenario of VDES. In VDES, database program is required to have knowledge of decryption keys for answering the queries.On the other hand, PPES has no knowledge of encrypting or decrypting keys and it can still execute the queries and answer directly in encrypted form.PPES returns the correct results or atleast a superset of correct results. The results will be in encrypted form as the database is not aware of the key.Also, the query will have the pattern string in encrypted form. Hence, PPES works entirely in an encrypted domain. This has an advantage when we deal with databases and the results distributed over network. VDES can be used efficiently where protecting pattern itself is not important and the program answering query is assumed to be protected. Based on the knowledge of keys,VDES database performs the query execution and returns unencrypted answers if desired.PPES can also be used in this use scenario. For this, we can make database aware of the keys.And when the query arrives, we can modify this query to have en- crypted pattern and then perform execution of this modified query. With the help of decryption key, we can decrypt the result to give final result as output. As it is clear from the two use scenarios, there was a need of PPES.
  • 10. 4.2 Model of PPES Model of PPES is shown in figure below. 4.3 Intuition behind PPES Let us consider this encryption scheme- Assuming that there is a function e:Σ ∗ → P where Σ is the set of alphabets or characters and P is a set of prime numbers.Hence e is function that maps each substring to a unique prime number. Using the above mapping, we can encrypt any string s as E(s) as follows: E(s) = e(t1 ) × e(t2 ) × e(t3 ).... × e(tn ) where t1 , t2 , t3 , ...., tn are the possible substrings present in s Similarily, E(abc) = e(a) × e(b) × e(c) × e(ab) × e(bc) × e(abc) As a result of above way of encryption, E(s) has a term for every substring present in it. So, if we look for a substring t in s, we can evaluate E(t) first and then check for the divisibility of E(s) by E(t). Divisibility implies the presence of substring t in s.This can be stated as A string s will have string t as its substring if E(t)|E(s)
  • 11. Inverse of above:If some number N divides E(s), then there exists a substring t in s such that E(t)=N.This statement may not be true. As we saw in case of E(abc), it is divisible by e(ab) × e(bc) but there exists no substring t such that E(t)=e(ab) × e(bc).Hence, inverse statement is false in this case. Although this idea can check the presence of a substring in just one step but the number of primes required to encrypt any string increases exponentially with the length of string. If the number of character are n and the maximum length of any string is l, then the number of primes required are around nl+1 . If n is around 256 which is 28 then the number of primes required can go out of bound with l = 5 only.One way of saving the number of primes required is to allot the prime number to only those substrings which are actually present in dictionary.And all remaining substrings possible may be given a single unique prime number. This idea is again not good as the size of dictionary may also be very large. 4.4 Our approach to PPES As we saw above that the requirement of primes is increasing exponentially with the size of the strings handled by PPES scheme. So we can restructure the scheme in the following manner so that it would be feasible to implement PPES: Let Σ is the set of alphabets and P is the set of available prime numbers. Now consider the random (or private) mapping e : Σ ∗ → P such that e maps strings over Σ to pth prime number from the set P where 1 ≤p ≤| P |. Since mapping is random (or private) then we can assume that the probability that two strings will map to same prime number is 1/| P | (or can be calculated based on the private mapping). Now we can encrypt a string s as E(s) using above mapping as follows: E(s)= e(t1 ) × e(t2 ) × e(t3 ).... × e(tn ) where t1 , t2 , t3 , ...., tn are all the possible substrings present in s. The same is applied to the pattern t to get E(t). Hence if t is a substring of s then E(t) must divide E(s). 4.5 Performing the operations String matching String matching in PPES is totally different from traditional string matching techniques. So if we are searching for string abc (let say string t) then first we have to encrypt t using the above encryption function i.e. E(t) = e(t1 ) × e(t2 ) × e(t3 ).... × e(tn ) where t1 , t2 , t3 , ...., tn are the all possible substrings present in t Now if E(t) divides E(s) where s is the string in which t is being searched then t is present in s or we can say that t is one of the substring of s.
  • 12. In PPES, string matching cost depends only on length of string being matched unlike traditional techniques where cost depends on length of string being matched as well as length of target string. So in PPES, string matching cost is just the cost of encrypting t and cost of one division operation. So if the pattern string is small enough then string matching can be done very fast. 4.6 Correctness of PPES: Existence of false positives Lets consider a given string s whose length is W so s will have W(W+1)/2 substrings. And we are searching for string t of length w. Let E(s) and E(t) be the encryption of s and t respectively. Theorem 2 If t is a substring of s then E(t) must divide E(s). Proof. Lets t has k substrings t1 , t2 , ...., tk so E(t) will be E(t)= e(t1 ) × e(t2 ) × e(t3 ).... × e(tk ) And as t is a substring of s so all the substrings of t must be present in the set of all substrings of s. So if s has n+k number of substrings then the encryption of s will be E(s)= e(t1 ) × e(t2 ) × e(t3 ).... × e(tk ) × e(s1 ) × e(s2 ).... × e(sn ) And as we can see that all the factors of E(t) are present is E(s) so E(t) must divide E(s). But vice versa is not true. Lemma 2. String matching in PPES will result into a superset of actual resultset i.e. false postivies may exist in the resulting set but there will be no false negative. Proof. In above theorem we have shown that false negative will not exist. Now suppose if t is not a substring of s then we can only assure of t not being present in s which implies that all other substrings of t except t itself, may be present in s which implies if e(t) and e(some substring of s) are mapped to same prime then it can make E(s) divisible by E(t) so it will lead to false positive. 4.7 Probabilty analysis of false positives In this section we shall try to find out an upper bound on probabilty of a false posi- tive. So lets assume t is pattern string of length w and s is the given string of length W (W ≥ w) and E(t) divides E(s). So there are w(w + 1)/2 substrings in t, and each of them is mapped to the same prime as some substring of s. In particular, the string t is mapped to e(t) and e(t) is present in E(s). So we have to find out an upper bound on the probability that E(t) divides E(s) but t is not a substring of s. Since E(t) divides E(s), the string t maps to e(t) which is also the mapping of some
  • 13. substring of s. Now there are W(W+1)/2 substrings of s so the probability of this is bounded by (W (W + 1)/2) × (1/ | P |). So probability will decrease with larger | P |. 4.8 Memory requirements Let s be a string whose unique substrings are t1 , t2 , ...., tk and substring ti repeats ei times. And let ti is mapped to prime pi i.e. e(ti )=pi . Then the encryption corresponding to s will be E(s) = e(t1 )e1 × e(t2 )e2 × .... × e(tk )ek And the size of the number p is given by lg(p) bits. So the string s will consume lg(E(s)) bits. lg(E(s)) = (e1 × lg(e(t1 ))) × (e2 × lg(e(t2 ))) × ....(ek × lg(e(tk ))) If pmax is the largest prime from the set of primes, P. Then lg(E(s)) ≤ lg(pmax ) × (e1 + e2 + .... + ek ) And (e1 + e2 + .... + ek ) is same as W(W+1)/2 where W is the length of s. So, log(E(s) ≤ lg(pmax ) × W (W + 1)/2 Which implies that memory requirements will increase quadratically with the size of string. So this scheme is not good for large documents but can handle strings of middle size i.e. addresses. 4.9 Cryptographic analysis of PPES For querying PPES encryted database, pattern t must be given in encrypted form, E(t), or indivdual factors can also be given over network but E(t) is preferred as individual factors may reveal the nature of query (if somebody knows about the domain i.e. if roll numbers are being stored into the database and somebody knows the format of roll number i.e. Y**** then he can easily break the query). But in database we are storing encrypted form of string s so factorizing E(s) will take significant amount of time. Lets assume that | Σ |=n and | P |=m. So if somebody knows n and m then tries to break this encryption then he can do as following: 1. Factorize reasonable amount of numbers from the database which is obviously very hard task as string of length l will have l(l + 1)/2 prime factors. 2. Then analyze the distribution of all prime numbers or choose brute force analysis i.e. map each prime to all possible strings.
  • 14. Now if n=40 and l=20 where l is domain parameter which tells that all strings in database can be of atmost length l then then all possible strings in that domain(≥ l (n × n −1 )) will be around 4020 (> 1025 ). So brute force approach will have that many n−1 mappings for a single prime. Cipher text analysis of the data is very hard as all the data is in form of product of many prime numbers so factorizing each one is very expensive hence attacked based on distribution can’t happen. 4.10 Strength of PPES The main strength of PPES is that we can query the database over the network with high safety. Pattern matching operation (which is a very common operation on strings in database) in PPES is very fast and is independent of size the target string. 5 Extension of substring method Let s be a string. Let n = |s|. Suppose that we store a set {s1 , s2 , . . . , sm } of distinct substrings of s. Encode each of these substrings, that is, obtain, E(s 1 ), E(s2 ), . . . , E(sm ) and then multiply these m primes to obtain the encoding of the string. The number of distinct substrings of s is n · (n + 1)/2. Therefore, this scheme saves space compared to the PPES scheme, provided, m < n · (n + 1)/2. The method for checking whether a given string t is a substring of s is as follows. t is a substring of s if and only if t can be extended (perhaps on both sides) so that the extended string becomes equal to s. If t cannot be extended as above , then t is not a substring of s. Equivalently, if t can be extended in either direction so that it becomes equal to any one of the known substrings s1 , s2 , .., sm , then also t can be inferred to be a substring of s. The problem is therefore the following. Design a choice of the substrings s1 , .., sm of s in such a way so that for every string t, it can be efficiently inferred whether t is a substring of s. For a given value w, a string t is said to be a w-extension of t, if, t is a substring of t and |t | − |t| = w. The set of all strings that are w -extensions of a given string t, where, w <= w, defines the w-extension ball centered at t. Suppose t lies in the w-extension ball centered at t, and an encoding E(t ) is avail- able. Then, if t is extended by a total of at most w characters on either side and then encoded, this encoding will equal the encoding of t . The complexity of this operation is the number of extension strings that are checked, that is, the size of a w-extension ball. Let Σ denote the alphabet over which the strings are formed. Then, the size of any w-extension ball is given by |Σ|x+y (x + y + 1) = O(w · |Σ|w+2 ) 0≤x+y≤w The problem is to design substrings s1 , . . . , sm such that given any substring t of s, there exists an index r, 1 ≤ r ≤ m, and sr lies within a w-extension ball centered at
  • 15. t. Consider the following scheme. Keep substrings si,j , where, 1 ≤ i ≤ w+1 , and n 1 ≤ j ≤ 1 + w+1 . The substring si,j refers to the substring of s of length (w + 1) · i n that begins at position (w + 1) · (j − 1) + 1. For a fixed value of i, the set of substrings si,j gives substrings of a fixed length at offsets 1, 1 + (w + 1), 1 + 2 · (w + 1), . . ., etc.. The set of substring lengths ranges from w + 1, 2 · (w + 1), 3 · (w + 1), . . ., etc.. 5.1 Correctness of the scheme In this section, we argue the correctness of the above scheme. We say that a substring t of s is w-covered by a substring si,j , provided, si,j is in the w-extension ball centered at t. We first show that the set of strings that are w-covered by the si,j ’s are all distinct. That is, if (i, j) = (i , j ), then, the set of strings covered by si,j and the set of strings covered by si ,j has no overlap. Lemma 3. Let t be a substring of s that is w-covered by both si,j and si ,j . Then, i = i and j = j . Proof. Suppose that i = i , that is, the lengths are different. Thus, ||si,j | − |si ,j || > w. However, if t lies in the coverage of both the substrings, then, it follows that, w > ||si,j | − |si ,j || = |(|si,j | − |t|) − (|si,j | − |t|) = |w1 − w2 | ≤ w where, w1 = |si,j | − |t| and w2 = |si,j | − |t||. Since, both si,j and si,j lies in the w-extension ball centered at t, it follows that 0 ≤ w1 ≤ w and 0 ≤ w2 ≤ w, implying a contradiction. Now suppose that i = i , that is the lengths of the substrings are the same, and j = j . A similar argument can be made for this case as well. The above argument shows that if t is any substring of s, then there is at most one s i,j such that si,j is in the w-extension circle centered at t. We now show that for every substring t of s, there is always one si,j lying within the 2w-extension ball centered at t. Lemma 4. Let t be a substring of s. Then there exists at least one si,j that 2 · w-covers t. Proof. Suppose that t starts at position p. Let r be the remainder and j be the quotient when p − 1 is divided by w + 1, that is, p − 1 = (w + 1) · j + r, where, 0 ≤ r < w + 1. (1) Let i and r be the quotient and remainder respectively, when, |t| + r is divided by w + 1, that is, |t| + r = (w + 1) · i + r , where, 0 ≤ r < w + 1. (2) Consider the substring si,j , where, j = j + 1 and i = i if r = 0 and i = i + 1 otherwise. Since p − 1 = (w + 1) · j + r, it follows that the starting position of the string si,j , namely, (w + 1) · (j − 1) + 1 = (w + 1) · j + 1 ≤ (w + 1) · j + r + 1 = p. Therefore,
  • 16. the starting position of si,j lies on or before the starting position of the given substring t. The length of the string si,j is given by (w + 1) · i. The effective length of the string becomes |t| + r. If r = 0, then, the ending location of si,j matches the ending location of t. Otherwise, r > 0 and i = i + 1. Therefore, ending location of si,j equals (w + 1) · (j − 1) + (w + 1) · i = (w + 1) · j + (w + 1) · (i + 1) = (w + 1) · j + (w + 1) · i + (w + 1) = (w + 1) · j + |t| + r − r + (w + 1) = (w + 1) · j + r + |t| − r + (w + 1) = p − 1 + r + |t| − r + w + 1 = (p + |t| − 1) + r + (w + 1 − r ) > (p + |t| − 1) Thus, si,j ends on or after t ends. It follows that si,j contains t. By the above analysis, it follows that, |si,j | − |t| = (w + 1) · i − |t| . There are two cases, namely, i = i , if r = 0 and i = i + 1, if r > 0. Suppose that r = 0. Then, |si,j | − |t| = (w + 1) · i − |t| = |t| + r − |t|, by equation (2) = r ≤ w, by equation (1). Otherwise, assume that 0 < r ≤ w and i = i + 1. Then, we have, |si,j | − |t| = (w + 1) · i − |t| = (w + 1) · (i + 1) − |t| = (w + 1) · i + (w + 1) − |t| = |t| + r − r + (w + 1) − |t|, by equation (2) =w+1+r−r ≤ 2 · w, since, r ≥ 1 and r ≤ w. This worst case is attained if t is the substring s[w + 1, w + 2] of size 2. Then, it is covered by the substring s2,1 of size 2 · (w + 1). Therefore, the minimum extension necessary is 2 · (w + 1) − 2 = 2 · w, matching the bound in Lemma 4. The test for whether a given string t is a substring of the string s is as follows. Enumerate the w- extension ball centered at t and check if any of them is exactly equal to one of the s i,j ’s (or, equivalently, encode each string of the w-extension ball centered at t and check if the encoding divides the product i,j E(si,j )). In view of Lemma 4, this test works if instead of w, we use the parameter w in the definition of the si,j ’s. Therefore, the 2 number of strings si,j is given by n 2 n2 w =O 2 +1 w2 5.2 Lower bound on number of substrings required We now consider a lower bound on the space used by any encoding algorithm that uses the above principle of matching strings from a w-extension ball centered at t.
  • 17. Lemma 5. Let s be any substring of s of size at least w + 1. Then, the number of substrings t such that s lies in the w-extension ball centered at t is at most 1 · (w + 1) · 2 (w + 2). Proof. Let a denote the number of characters extended at the left end of t and b denote the number of characters extended at the right end of t, to obtain s . Each distinct choice of a and b such that a + b ≤ w gives a distinct t such that s lies in the w-extension ball centered at t. Therefore, the number of possible such substrings is given by w w−a w w+1 1 1= (w − a + 1) = a = · (w + 1) · (w + 2) . 2 a=0 b=0 a=0 a =1 Lemma 6. The number of substrings of a given string s of size n that must be stored by any scheme that works based on testing the equality of one of the substrings with a n·(n+1) member of the w-extension of a given string t is at least (w+1)·(w+2) . Proof. The number of substrings of a given string s is n · (n + 1)/2. By Lemma 5, it follows that each substring of size at least w + 1 w-covers (w + 1) · (w + 2) substrings. Substrings of size less than w + 1 w-covers even fewer strings. Therefore, in order to w-cover all substrings, the minimum number of strings that must be used is at least n·(n+1) (w+1)·(w+2) . 2 It follows that the number of substrings used by our proposed scheme is O( w2 ), which n n·(n+1) is within a small constant factor of the lower bound, namely, (w+1)·(w+2) , by Lemma 6. 6 A Hierarchical Scheme In this section, we propose a two-step scheme that is based on the previous scheme. Let w1 and w2 be two integer parameters, where, w1 > w2 > 0. Given a database string s, we first divide it into adjacent blocks of size w1 , that is, s = s1 s2 · · · sk , where, each of the si ’s have size w1 and the last block is padded by the requisite number of null characters to ensure that its length is exactly w1 . The encoding of s, namely, E(s), is a hierarchical data structure. Its first field is the sequence of the encodings of the blocks, that is, E(s1 ) ◦ E(s2 ) ◦ · · · ◦ E(sk ), where, ◦ denotes the sequencing operator. We assume that w1 is large enough to negate statistical attacks to decode the encrypted blocks. Let t be a substring of s. Then, the following two exclusive possibilities hold. 1. In a match of t with s, t spans more than one block of s. In other words, t can be written as t = t0 t1 t2 . . . tl , where, l ≥ 1, and t1 , t2 , . . . , tl−1 are blocks of length w1 each and |t0 | ≤ w1 and |tl | ≤ w1 . Further, there exists an index r with the property that sr = t1 , sr+1 = t2 , . . . , sr+l−1 = tl−1 and t0 is a suffix of sr−1 and tl is a prefix of the string sr . 2. In a match of t with s, t is contained within a block of s. That is, |t| ≤ w 1 and t is a substring of sr , for some index r, 1 ≤ r ≤ k.
  • 18. The second possibility can be effectively solved by storing, for each block s i , 1 ≤ i ≤ k, a set of substrings of si in their encoded form using an extension parameter w2 , as detailed in Section 5. Thus, if |t| ≤ w1 , and there is a match of t with a substring of s that is completely contained within a block, then, such a match can be effectively decided. The total number of encodings in this scheme are as follows. 2 2 w1 n w1 n · w1 O k· 2 =O · 2 =O 2 (3) w2 w1 w2 w2 In order to identify a match where t is a substring of a block of s, the w 2 -extension ball of t has to enumerated and compared against all the encodings of the substrings of the blocks of s. Assuming that all the encodings are stored as a hash table, then, the dominant component of the cost of this operation is the cost of enumerating the w2 -extension ball centered at t. As discussed in Section 5, this cost is O(w 2 · |Σ|w2 ). Single Text Offset, Multiple Pattern Offsets. We now consider the scenario depicted by the first possibility, namely, that t is a substring of s and a match of t spans multiple blocks of s. Since the match of t with the matching portion of s may start at any po- sition p, 1 ≤ p ≤ w1 , where, p is the starting position of first block of s from where the match begins. To alleviate this situation, we assume that the pattern t is encoded in w1 distinct ways, obtained by shifting the starting position of the first block of t by a parameter u = 0, −1, −2, . . . , −(w1 − 1) respectively. Thus, if u = 0, the first block contains the characters t1 t2 · · · tw1 and the remaining blocks are constructed sequen- tially thereafter. If u = −1, then the first block contains the characters t 1 t2 · · · tw1 −1 , and the remaining blocks are constructed sequentially thereafter. For a general value of u, the first block contains the characters t1 t2 · · · tw1 +u , and the remaining blocks are encoded in sequence. Thus, t is encoded w1 times, with offsets ranging from 0 to w1 −1. This increases the encoded size of the pattern t by a factor of w1 . Note that although we have assumed that the offset for the text string s is 0, it is in general not necessary to assume this. The offset for the text string can be set to any random value between 0 and −w1 + 1; this has the added advantage of reducing the chance of statistical attacks. Matching Algorithm. For a given offset position u, −w1 + 1 ≤ u ≤ 0, denote the j th (u) block of t defined for this offset position as tj . Let lu denote the number of blocks of t encoded with an offset of u. Assume that t is a substring of s and there is a match of t spanning multiple blocks of s. Then, there exists an offset position u, −w 1 +1 ≤ u ≤ 0, (u) (u) and an index r, 1 ≤ r ≤ k, such that, t1 is a suffix of sr , tj = sr+j , for 2 ≤ j ≤ (u) (u) (u) lu −1 and tlu is a prefix of sr+lu . Since tj = sr+j , it follows that E(tj ) = E(sr+j ). Further, since there are at most w1 possible non-empty prefixes and w1 possible non- empty suffixes of any block, the encodings of the prefixes and suffixes of each block of the text string are also stored along with the chosen set of substrings for the block. The number of blocks are k = w1 . The number of prefixes and suffixes of a block is n 2 · w1 . Therefore, the total number of prefix and suffix encodings is n · 2 · w1 ≤ 2 · n + 2 · w1 = O(n) . (4) w1
  • 19. |t| The number of encodings required for the pattern string is w1 · w1 = O(|t|), and is therefore, linear in the size of the pattern string. The total number of encodings (space) required for the text string s is given by the sum of equations (3) and (4), which is O( n·w1 + n). Suppose that the database string size n is large, and w1 is chosen to be w22 say 64. If w2 is chosen to be 4, then, this reflects a substantial improvement over the n2 √ O( w2 ) encodings scheme presented in Section 5. Specifically, if w2 ≥ w1 , then, the number of text encodings is linear in n. The time complexity of the substring matching √ operation is O(w1 · n · |Σ| w1 ). 2 Cryptographic Strength. Suppose that t is a substring of s and there is a match of t that overlaps multiple blocks of s. Then, in the encrypted string, the approximate position of t within s can be inferred. Although, if s and t are both unknown to a third party (say, the data mining outfit), then, this revelation is of no consequence. If t is a substring of s that is completely contained in a single block of s, then, no positional information is revealed. Another property of the hierarchical scheme is that all matches of t with s can be found. That is, if t occurs many times in s, then, all occurrences of t can be found using the data structure. Once again, if this inference is being done by a third party, which is not privy to either s or t, then, no information is revealed. 7 Matching Patterns specified as Regular Expressions In this section, we present a scheme for finding all occurrences of a pattern specified as a regular expression against a given encrypted database string s. We first discuss a simple principle of privacy preserving computations, and, then, state the problem that arises in the context of privacy preserving finite automaton computations and then present one possible solution to it. 7.1 A principle of privacy preserving computations Privacy cannot be preserved by computations that encrypt or decrypt using symmetric keys. Further, privacy is not preserved if text is either encrypted or decrypted within the program. The principle states the following. Suppose that there is a privacy preserv- ing computation being carried out within a third party, and the computation applies a symmetric key encryption (or decryption) function to some text. We can assume that the key is available to the program and therefore to the third party (using program anal- ysis). Further, we assume that the encryption algorithm is also known. Therefore, the third party can both encrypt and decrypt the text or cipertext, as the case may be. The second statement of the principle is more general; clearly, if a string is encrypted or decrypted, then the string is revealed. A consequence of the above discussion is that the all privacy preserving computations must work on ciphertext. 7.2 Privacy preservation of state transition computations Let D be a deterministic finite automaton that accepts a given regular expression. We note that there is a basic problem in preserving the privacy of a string accepted by a finite automaton.
  • 20. Consider the state transition function of the given DFA D. We can assume that the set of states of the automaton is known to the data mining party (or can be inferred from the code available). Let t be the string seen so far by the DFA D, and let q be the current state of D. The next transition reveals the letter that extends t and the next state of D. By keeping track of the matching letters in this manner, the matching string can be known in entirety. Therefore, the most that can be assumed is that the letters of the alphabet Σ are encrypted. This is equivalent to assuming that the alphabet Σ has been permuted, using a permutation that is not known to the data mining party. However, using statistical analysis, the data mining party can gain some information, and therefore can make a guess for the pattern string whose probability of being correct is greater than that of a random guess. Note that the problem is independent of the mechanism used for encrypting the database string s. 7.3 A weak solution The problem outlined in Section 7.2 is inherent to finite automaton computations. A DFA extends the prefix of a matching string by a single letter to obtain a longer prefix. Thus, the unit of encryption possible are the letters of the alphabet Σ , making it susceptible to statistical attacks. A simple extension is to transform the given DFA D into another machine D (D is a finite state automaton with a little extra power) such that, D makes its transformations on the strings of Σ w , where, w is a parameter. The final suffix of any string of the regular language whose size is not a multiple of w uses transitions from the enlarged alphabet ∪w−1 Σ w . The value of w is not known to the w =1 data mining party. We choose a permutation π that maps Σ w to Σ w , and ensure that D uses the transformed alphabet π(Σ w ). The values of w and π are withheld from the data mining party. The scheme is better than the previous scheme, since statistical attacks would re- quire knowledge of statistics of strings of size w, for an unknown (though small) value of w. This reduces the effectiveness of statistical attacks. This approach can be strength- ened slightly, as demonstrated below by means of an example. 7.4 A slightly better solution Consider the alphabet Σ = {a1 , a2 , . . . , ak }. Suppose we choose powers of 2, namely, 2, 22 , . . . , 2v , where, v is a parameter, v ≤ k. Partition the alphabet into w subsets, Σ i , 1 ≤ i ≤ v, such that the subsets are pair-wise disjoint. Define the following sets. i 2 Λi = Σi , for 1 ≤ i ≤ v. That is, the set Λi is the set of strings of size 2i that is constructed from letters from Σi . For 1 ≤ i ≤ v − 1, define the following set. ∆i = {σ | |σ| = 2i and σ ∈ (∪v Σj )∗ and ∃ a, b ∈ σ such that a ∈ Σi and b ∈ (∪v j=i ∗ j=i+1 Σj ) } The set ∆i is the cross set between Σi and the partitions with indices higher than j. That is, it is the set of all strings of size 2i over the letters of partitions with indices i or
  • 21. i above (i.e., ∆i ⊂ (∪v Σj )2 ). Further, all strings in ∆i are constrained to contained j=i at least one occurrence from the sets Σi and ∪v j=i+1 Σj (i.e., it is a cross string). Let w = 2v . As a consequence of the construction, the sets ∪v Λi and the sets i=1 v−1 ∪i=1 ∆i can generate Σ w uniquely. That is, each string of Σ w can be uniquely repre- sented as the concatenation of strings in the Λ and ∆ sets (prove!). The transformed and equivalent automaton D can be constructed so that it uses the following alphabet. v−1 Σ = alphabet of D = E(∪v Λi ∪ ∪i=1 ∆i ) i=1 That is, members of Σ are encrypted members of Λi ’s and ∆i ’s. Accordingly, the database string s is encrypted accordingly. Cryptographic strength. The privacy of the scheme draws from the fact that the parti- tion of the original alphabet into the subsets is not known to the third party. Further, the value of v is also withheld from the third party. These two reasons reduce the effective- ness of statistical attacks. However, some information about the length of the string can be deduced. 8 Performing Aggregate Operations on Encrypted Numeric Values In this section, we propose a scheme for performing aggregate operations like SUM, AVG or COUNT on encrypted numeric data. A lot of work has been done in the past in the field of security and mobile computing for performing arithmetic operations over encrypted numbers. These are based on encryption transformations called Privacy Ho- momorphisms (refer [5],[6],[7],[8],[9]). However, these techniques are not applicable to databases. Many of these are limited to operations on two numbers and some of these require message passing for each operation(refer [13]). Recent works related to perform- ing query over encrypted data allow comparison operations like GREATER THAN, LESS THAN, MAX or MIN (refer [4]). The scheme proposed by us is directly applicable to the databases and is robust against brute force or cipher-text analysis attack. 8.1 Scheme 1 Our scheme is based on the fact that we can use a function, say f , to encrypt a numeric value d such that f (x, y, ..) = d. The solution < x, y, .. > can be used as an encrypted value for d. The properties which a function f should satisfy are as follows- 1. The function f should have many solutions equation f (x, y, ..) = d where d can be any numeric value in the range of data being encrypted i.e. f is many-to-one function. 2. The function f should have atleast two arguments. 3. It should be possible to evaluate Σf (xi , yi , ...) given the values of Σxi , Σyi , ...
  • 22. Let us consider a function f (x, y) = kx+y. This function satisfies all the properties stated above. Hence, we can encrypt any data d as < x, y > such that kx + y = d. Note that, it is possible to have many solutions for the same data d. Let us assume that we have to add n numeric values d1 , d2 , ..., dn and we have the encrypted values < x1 , y1 >, < x2 , y2 >, ..., < xn , yn >. Given the encrypted values, we can calculate Σi=1 xi and Σi=1 yi . Obtaining these values, we can evaluate Σi=1 di as it is given by n n n Σi=1 di = kΣi=1 xi + Σi=1 yi . So, we see that we can perform the addition directly n n n over the encrypted data. Based on the knowledge of xi and yi , it is not possible to get di without the knowledge of k. It is also possible to have function having n arguments where n > 2. This simple idea has a drawback that it discloses the data distribution. For above example, encrypted values corresponding to same data d will lie on the same line with slope k. Identification of any one line will give the value of k. Drawing lines through other points and parallel to line corresponding to d will result in distribution knowledge based on the distance between lines. This is the major limitation which needs to be eliminated. 8.2 Improving the Scheme Based on the above idea, we find a function f which satisfies the properties stated earlier. Let us take f (x, y) = kx + y. Now, we select two prime numbers p and q. Then, we find the solution < x, y > for f (x, y) = d where d is numeric data being encrypted. Now, instead of encrypting d as < x, y >, we encrypt d as < a, b > such that a mod p = x and b mod q = y. Note that the encrypted set consisting of all < a, b > for f (amod p, b mod q) = d is spread over the entire space and not just on a line. And it is not possible to guess the value of d without knowing prime numbers p and q. But the above scheme is prone to attack. As p is a large prime number and x1 and x2 are mapped to mp + x1 and np + x2 respectively so one can get approximate value of m/n by calculating (mp + x1)/(np + x2) and subsequently m and n as m, n are integers. Now one can guess p very closely. Using same approach q can also be guessed. Knowing p and q one will retrieve all < x, y > tuple. 8.3 Scheme 2 Let us suppose, the data value d to be protected can be represented as N -bits long binary number. Identify the set of Linear Transformations L. Now, do as follows- 1. Divide the N -bits long binary number into m consecutive unequal parts. 2. Apply linear transformations from the set L to each of the m parts so as to map each part to a binary number of fixed length l. 3. Now, re-order these linearly operated parts and save these re-ordered parts in database. The order for shuffling these parts will be fixed. The data encrypted in above format can be used for performing addition directly without decryption. To get the sum of any n numbers, we need to do the addition of entries corresponding to these n numbers for each of the m columns. This gives the result in encrypted form. The result can be decrypted as follows-
  • 23. 1. As the re-ordering sequence is known to us, we re-order the the m-parts of result in original sequence. 2. As each part was linearly transformed, so the addition of linearly transformed value corresponds to linear transformation of addition of original values i.e. h(Σ i=1 xi ) = n Σi=1 h(xi ). This linear transformation h is known to us and we know h(Σi=1 xi ), n n we can use the inverse function h to get Σi=1 xi . −1 n 3. Get the binary representation of Σi=1 xi for each of the m columns. The resulting n m parts can be used to get the final result. The strength of above scheme is based on the fact that the way N -bits number is bro- ken into m unequal parts, is not known to the third party performing computation on encrypted data. Total number of ways in which N -bits number can be broken into m consecutive unequal parts is N −1 Cm−1 − 1. For N =128 and m=10, number of ways for this are O(1014 ). Also, the number of ways m parts can be reordered is m! which is O(106 ) for m=10.So, to break the encryption, the intruder must be able to guess the lin- ear transformations corresponding to each part and then check O(10 20 ) cases in worst case. 8.4 Final Scheme As we saw, Scheme 1 is not strong enough to protect the data itself but it hides the distribution very well.On the other hand, Scheme 2 is highly protective but it does not hide the data distribution. Hence, we came up with a final scheme which is fusion of Scheme 1 and Scheme 2. Compute < x, y > and calculate < a, b > tuple using scheme1 and protect a and b using scheme 2. In this manner, we not only protect the data but also the distribution. 9 Formal aspects of privacy preserving computations In this section, we formally define privacy preserving computations and partial privacy preserving computations. Definition 7. A language L ⊆ Σ ∗ ×Σ ∗ is said to be computable in a privacy-preserving fashion if there exist computable functions f and g and a Turing machine M such that, (x, y) ∈ L if and only if (f (x), g(y)) ∈ L(M ) and E NTROPY((x, y) | (f (x), g(y))) = E NTROPY((x, y)). The intended meaning of the string (x, y) of L is that x is a database string and y is an encoding of the property being checked. We present several examples later on to clarify the above definition. In general, the calculation of the conditional entropy function may prove to be difficult. In order to extend the scope of the definition, we also introduce the notion of partially privacy preserving computations and the index of privacy preser- vation. In this section, the letter M , together with subscripts and superscripts, typically denote Turing Machines (TMs).
  • 24. Definition 8. A language L ⊆ Σ ∗ ×Σ ∗ is said to be computable in a partially privacy- preserving fashion if there exist computable functions f and g and a Turing machine M such that, (x, y) ∈ L if and only if (f (x), g(y)) ∈ L(M ) and E NTROPY ((x, y) | (f (x), g(y))) ≤ E NTROPY ((x, y)). The index of privacy preservation is defined as E NTROPY ((x,y)|(f (x),g(y))) E NTROPY ((x,y)) . The formal definitions are intended for comparing competitive privacy preserving schemes. We now present examples of languages that can be computed while partially preserving privacy. Example 9. Let L = {(x, y) | y is a substring of x}. Then, f (x) can be the PPES encoding of x, and g(y) could be the function mapping strings to primes, using the same mapping function used by f . Here, the TM M checks whether g(y)|f (x). Example 10. Let L be any language that satisfies the following property: |{y | (x, y) ∈ L}| is finite. Let L1 denote the language {x | ∃y such that (x, y) ∈ L}. For every x ∈ Σ ∗ , f (x) is the PPES encoding of {y | (x, y) ∈ L}. Since this set is finite for all x ∈ Sigma∗ , the PPES encoding is well-defined. The function g(y) is the mapping of y to a prime using the same mapping function used by f . The mapping described is, in general, partially privacy preserving, since, for every x, such that x ∈ Σ ∗ − L1 , the PPES encoding f (x) corresponds to the empty string. This reveals some information about the database strings. The complexity of a partially privacy preserving mapping is parameterized by several measures. The database size measure is given by |f (x)| as a function of |x|. This is a measure of the expansion in the size of the database strings. The input size measure is given by |g(y)| as a function of the input size |y|. The time and space complexity of the input transformations f and g form yet another set of relevant complexity measures. The time and space complexity of deciding the transformed language {(f (x), g(y)) | (x, y) ∈ Σ ∗ } yields the final measure of complexity. Example 11. The universal language U = {(x, M ) | x ∈ L(M )}, where, M repre- sents the encoding of the TM M . Consider the alphabet scrambling scheme discussed in Section 7.4. Let g( M ) = M , where, M is the TM that is isomorphic to M , except that the alphabet used is the transformed alphabet Σ , corresponding to the spe- cific bits used by the alphabet scrambling scheme. Let f (x) be the encoding of x in the alphabet Σ . Let M be the universal TM over the alphabet Σ . The reason for calling this language as the universal language for privacy preserva- tion is that if L can be computed in a privacy preserving fashion, then, all recursive sets can be computed in a privacy preserving fashion. This can be argued as follows. Let L be a language that is computable in a privacy preserving fashion. Then, there exists com- putable functions, f1 , g1 and a TM R such that (x, y) ∈ L iff (f1 (x), g1 (y) ∈ L(R). For a given y ∈ Σ ∗ , let < N (y) > be the encoding of a Turing machine N (y) that works as follows. N (y) takes an input u and runs R on the pair (f1 (u), g1 (y)), ac- cepting iff R accepts. Consider the pair (x, < N (y) >). Clearly, x ∈ L(N (y)) iff (f1 (x), g1 (y) ∈ L(R).
  • 25. 10 Conclusions With increasing concerns for privacy and safety of data, it has become necessary to develop techniques for the preservation of data being collected in databases. As it is possible for an intruder to get access to raw database files, there is a threat of informa- tion leakage or exposure.Encryption techniques come as a rescuing technique for this problem but it renders the encrypted data useless for answering any SQL queries.In this paper, we have presented schemes which can perform SQL queries directly over the encrypted data. The VDES scheme suggested in this paper can be used for protecting variable length string data and answering pattern matching queries. This scheme can be used in the domain where program running to answer the query is assumed to be safe. Another scheme, for string pattern related queries, called Extension of PPES given in Section 5 can be used for complete security. This can also be used for third party com- putation without any revelation. We also provide a scheme in Section 8 that preserves the result of SUM and AVERAGE operations of numerical data types. We see that if above schemes are incorporated with the OPES scheme presented in [4], the encrypted database will be able to answer the following queries: Exact String Matching, Finding Substrings in given set of Strings, Pattern matching in strings (LIKE operations) and queries related to numerical data type like SUM,AVG,COUNT,MAX,MIN,GROUP BY and ORDER BY. 11 Future Work In future, the work done in this paper can be extended to find a better technique for regular expression matching. Also, the techniques presented by us for answering SQL queries come with a cost in terms of higher secondary memory/storage requirements. Although secondary memory is getting cheaper, reduction in its usage can have a bet- ter performance impact on implementation. Also, the scheme described in Section 8 for preserving aggregate operations on numerical data types can be extended to answer MAX or MIN queries. Acknowledgments . We would like to thank Dr. Sumit Ganguly for his constant guid- ance, motivation and for being a source of inspiration for us. References 1. Ozsoyoglu, Singer. Anti-tamper databases :querying encrypted databases.In Proc. Of 17th annual IFIP WG11.3 Working conference on Database and Application Security, Colorado, August 2003. 2. Song, Wagner and A.Perrig. Practical techniques for searches on encrypted data.In IEE Symp. on security and privacy, Oakland, California, 2000 3. H.Hacigumus, B.R.Iyer, C.Li and S.Mehrotra. Executing SQL over encrypted database- service-provider model.In Proc. Of ACM SIGMOD Conf on Mamagement of Data, Madison, Wisconsin, June 2002. 4. R.Agrawal, J.Kiernan, R.Srikant and Y.Xu. Order Preserving Encryption for Numeric Data.In SIGMOD 2004, Paris, France, June 2004.
  • 26. 5. N.Ahituv, Y.Lapid and Neumann. Processing encrypted data.Communications of ACM30(9):777-780, 1987. 6. J. Domingo-Ferrer and J.Herrera-Joancomarati. A privacy homomorphism allowing field op- erations on encrypted data.Journes de Matematica Discreta i Algorismica, Universitat Po- litecnica de Catalunya, March 1998. 7. J.Domingo-Ferrer. A new Privacy homomorphism and applications.Information Processing Letters, 60(5):277-282, 1996. 8. Feigenbaum, Liberman and R.N.Wright. Cryptographic protection of databases and soft- ware.In Proc. of DIMACS Workshop on Distributed Computing and Cryptography, 1990. 9. R.L.Rivest, L.Adelman and M.Dertouzos. On data banks and privacy homomorphisms.In Foundations of secure computation, 169-178, 1978. 10. R.Agrawal, D.Asonov and R.Srikant. Enabling sovereign informatrion sharing using web service.In SIGMOD 2004,Paris,France,June 2004. 11. R.Agrawal, Bayardo, Kiernan, Faloustsos, Rantzau and R.Srikant. Auditing Compliance With a Hippocratic Database.In Proceedings of VLDB Conference, Toronto, Canada, 2004 12. R.Agrawal and R.Srikant. In Proc. Of ACM SIGMOD Conference on Management of Data, 2000. 13. I-Ling Yen,Wei Li,Qingkai Ma and Farokh Bastani.Secure Computation with Low Overhead. In University of Texas at Dallas,Richardson 14. Tomas Sanderand Christian F. Tschudin. Towards Mobile Cryptography. In Proceedings of the IEEE Symposium on Security and Privacy 15. Bijit Hore, Sharad Mehrotra and Gene Tsudik.A Privacy-Presrving Index for Range Queries. In Proceedings of 30th VLDB Conference, Toronto,Canada, 2004.
  • 27. Appendix Privacy of the SUM operator In this section, we present a solution to the following problem. Given a column of values d1 , d2 , . . . , dn , compute the sum d1 + d2 + . . . + dn in a privacy preserving fashion. Assume that M is an upper bound on this sum. This section should be studied with Section 8 of this paper. Fix an element d = di occurring in the column. Let d be partitioned as the sum d = x + y of two elements x and y, where, x < U and y < U for some upper bound U . Choose two distinct numbers a and b, each larger than M · U , such that a and b are both prime numbers. 1 Further, without loss of generality, assume that a > b. Consider the group G = {0, 1, 2, . . . , a − 1} under the operation of + mod a. Consider the group H = (b), that is, the group generated by b, that is, the set H = {b, 2b, 3b, . . .}. H is a subgroup of G. By Lagrange’s theorem, |H|||G|. Let |H| = h. Since H is a subgroup of G, it contains the 0 element. Let the zero element be kb, where, 1 ≤ k ≤ h ≤ a. Thus, kb|a. Since, (a, b) = 1, therefore, k|a. Since a is prime, k|a and k ≤ a iff k = a. This implies that h = |G| and therefore, H = G. Thus, there exists a unique index k1 such that k1 b = 1 mod a. Let a = a mod b. In a similar way, it can be argued that there exists a unique integer k2 such that k2 a = 1 mod b. We can now design the encoding scheme as follows. Corresponding to a given inte- ger d, construct the following integer. E NCODING(d) = k1 · b · x + k2 · a · y, where, x + y = d, x < U, and y < U . Given values d1 , d2 , . . . , dn , each of the values is first transformed into di = xi + yi , using private bits. Subsequently, we form E NCODING(di ), for i = 1, 2, . . . , n. The sum is obtained by taking the sum of the encodings. That is, n n n n E NCODING( di ) = E NCODING(di ) = k1 · b · xi + k2 · a · yi . i=1 i=1 i=1 i=1 The decoding operation is as follows. x = E NCODING(d) mod a and y = E NCODING(d) mod b The correctness of the decoding operation can be shown provided, each of a and b are at least M . Since, k1 · b = 1 mod a and x < M ≤ a, therefore, E NCODING(d) mod a = k1 · b · x mod a = x mod a = x. Similarly, it can be shown that E NCODING(d) mod b = y. Now x + y can be added to obtain d. 1 the notation (a, b) stands for the GCD of a and b.