SECURING SOCIAL INFORMATION
FROM QUERY ANALYSIS
IN OUTSOURCED DATABASES	

Junpei Kawamoto and Masatoshi Yoshikawa (Kyoto University)
iDB Forum 2008	
 2008/9/22	

AGENDA	
1. 
2. 
3. 
4. 
5. 

Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Result Generalization by Bloom Filter
Conclusion and future work	

2
iDB Forum 2008	
 2008/9/22	

AGENDA	
1. 
2. 
3. 
4. 
5. 

Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work	

3
iDB Forum 2008	
 2008/9/22	

BACKGROUND OF OUR RESEARCH	
Ò 

Outsourced Database systems are in widely use

We can delegate management to service providers
É  We can save on management costs
É  We do not accept requests from outside of local networks
É  We can protect the security of local networks
É 

www	

Traditional database system	

www	

Outsourced database system	

4
iDB Forum 2008	
 2008/9/22	

PROBLEMS	
Ò 

Security of users’ data
Service Provider has total authority of databases
É  Service Provider can inspect users’ data
É 

Data encryption at client side gives us a solution to the problem	
Ò 

Security of users’ social information
Social information
means the relationships among user’s accounts	
5
iDB Forum 2008	
 2008/9/22	

WHY IS SOCIAL INFORMATION IMPORTANT?	
Ò 

Two kinds of risk emerge from the social information leak
Personal information could be compromised in a chain effect
É  Different online personae could be connected 	
É 

6
iDB Forum 2008	
 2008/9/22	

WHY IS SOCIAL INFORMATION IMPORTANT?	
Ò 

Personal information could be compromised in a chain
effect
Bob do not compromise his information.
However server can guess his name.	
ID: lee	

ID: carol	
ID: alice_gc	
Alice compromised her personal
information blunderingly.
Name: Alice, Boss: Bob, etc.

In a chain effect, server may be able
to guess other users information.

7
iDB Forum 2008	
 2008/9/22	

WHY IS SOCIAL INFORMATION IMPORTANT?	
Ò 

Different online personae are connected 	
Servers could guess these login
names are associated with Alice.	

ID: alice_gc	

Public (official) account	

Other people’s account	

ID: Cheshire	

Private account	

8
iDB Forum 2008	
 2008/9/22	

OUR PROPOSAL METHODS	
Ò 

Query Generalization by Dynamic Hash

Ò 

Result Generalization by Bloom Filter	

9
iDB Forum 2008	
 2008/9/22	

AGENDA	
1. 
2. 
3. 
4. 
5. 

Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work	

10
iDB Forum 2008	
 2008/9/22	

DATA ENCRYPTION MODEL	
Ò 

We use a traditional data encryption model
É 

Original table (not encrypted)
No
.	

name	

1	

Products Review	
 7/17 15:00	
 7/17 18:00	
 Alice, Bob	

2	

Business Trip	

7/18 10:00	
 7/20 18:00	
 Alice, Bob	

3	
É 

begin	

Who has authority	
end	

Team Meeting	

7/20 15:00	
 7/20 18:00	
 Alice, Carol	

4	
 Business Trip	
7/21 12:00	
Encrypted table (with index)
No etuple	
.	

Iname	
 Ibegin	
 Iend	

acl	

7/21 17:00	
 Dave	
Etuple is encrypted original tuple	
etuple = Encrypt(name, begin, end, acl)	

1	

5f0f1f46...	
 00	

10	

10	

2	

b98009af...	
 01	

00	

11	

3	

082ba604...	
 10	

11	

11	

4	

8bc546af...	
 01	

01	

01	

Iname, Ibegin, and Iend are hash index
used for query processing	
Iname = Hash(name)	
11
iDB Forum 2008	
 2008/9/22	

DATA ENCRYPTION MODEL	
Ò 

Query processing on the data encryption model	
Name = “Business Trip”
&
Begin = “7/18 10:00”	
Original query	

Iname = “01”
&
IBegin = “00”	
Query on server	
Name = “Business Trip”
&
Begin = “7/18 10:00”	
Query on client	

No
.	

name	

begin	

end	

2	

Business Trip	
 7/18 10:00	
 7/20 18:00	

4	

Business Trip	
 7/21 12:00	
 7/21 17:00	

etuple	

Iname	
 Ibegin	
 Iend	

b98009af...	
 01	

00	

11	

8bc546af...	
 01	

01	

01	
12
iDB Forum 2008	
 2008/9/22	

AGENDA	
1. 
2. 
3. 
4. 
5. 

Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work	

13
iDB Forum 2008	
 2008/9/22	

QUERY GENERALIZATION	
Ò 

Servers can guess users’ relationship by queries.
If only Alice and Bob sent this query,
server can guess they have some relation.	

Alice

Bob

Carol

SELECT *
FROM schedule
WHERE begin = “7/15 10:00”
SELECT *
FROM schedule
WHERE begin = “7/14 10:00”

SELECT *
FROM schedule
WHERE begin = “7/15 10:00”
or “7/14 10:00”

This query is requested by at least three users
so that server cannot find group information.	
14
iDB Forum 2008	
 2008/9/22	

HOW TO GENERALIZE	
Ò 

Firstly, queries are described by hash indices
Begin = 7/15 10:00	
Begin = 7/14 10:00	

Ò 

IBegin = 0001011	
IBegin = 0011110

Next, the query is translated before it send to DB.

www	

DB	

Generalizer	
Organization’s network	
15
iDB Forum 2008	
 2008/9/22	

HOW TO GENERALIZE	
Ò 

Generalizer uses dynamic hash to translate queries
Translated query is
IBegin = 00*****	

0	

0001011
0011110

1	

0100101
0110110	

IBegin = 01*****	

1010011
1101101
1100011

IBegin = 1******	

0	
1	

*: wild card	
node	

leaf	

Begin = 7/15 10:00	

IBegin = 00*****	

Begin = 7/14 10:00	

IBegin = 00*****
16
iDB Forum 2008	
 2008/9/22	

SPLITTING LEAF	
Ò 

Leaves are split to keep the distribution of hash balanced
Insert new hash : 1000110	
0	

0001011
0011110

1	

0100101
0110110	

0	
1	

1010011
1101101
1100011
1000110

0	

0001011
0011110

1	

0100101
0110110	

0	

1	

0	

1010011
1000110

1	

1101101
1100011

So that, moderate size hashes could be mixed.
17
iDB Forum 2008	
 2008/9/22	

AGENDA	
1. 
2. 
3. 
4. 
5. 

Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Result Generalization by Bloom Filter
Conclusion and future work	

18
iDB Forum 2008	
 2008/9/22	

RESULT GENERALIZATION	
Ò 

Servers can guess users’ relationship by query result.
user

tuple

Alice

tuple1

Bob

tuple2

Carol

tuple3

Dave

tuple4

If only Alice and Carol request
the tuple2, servers can guess
there are some relationships
between them
If Alice and Dave never request
same tuples, servers can guess
there are no relationship
between them	

To prevent servers’ guessing, some irrelevant tuples are requested.

19
iDB Forum 2008	
 2008/9/22	

RESULT GENERALIZATION	
Ò 

User’s queries are generalized
Original query	

Generalized query	

www	

user

DB	

tuple

Alice

tuple1

Bob

tuple2

Carol

tuple3

Dave

tuple4

In generalized query, each tuple is
received by three users.
Servers cannot guess relationships
from the information of result
tuples	
20
iDB Forum 2008	
 2008/9/22	

BLOOM FILTER BASED GENERALIZATION	
Ò 

Users are described by k+n length bit string
Alice(uid = 1)	
 1 0 0 0 0 1 1

Bob(uid = 2)	
 0 1 1 1 0 0 0
Carol(uid = 3)	
 1 0 1 0 1 0 0

(uid mod 2) + 1	

Ò 

hash(uid)	

Dave(uid = 4)	
 0 1 0 0 1 0 1

Access authority is logical disjunction of the bit strings	
If Alice and Bob have authority: 	
1 1 1 1 0 1 1
If Alice and Carol have authority: 	
1 0 1 0 1 1 1
In this example,
k =2, n = 5	

Alice	
 1 0 0 0 0 1 1
∨

Bob	
 0 1 1 1 0 0 0
21
iDB Forum 2008	
 2008/9/22	

QUERY PROCESSING	
Ò 

When user requests tuples, bit string is used	
A tuple Alice & Bob have
authority to	

1111011

1111011

Alice	

1000011

10

00011

Bob	

0111000

01

11000

Carol	

1010100

10

10100

Dave	

0100101

01

00101

Server side	
At server side,
first k bits are used	

Client side	
At client side,
last n bits are used	
22
iDB Forum 2008	
 2008/9/22	

ANONYMITY	
In first k bits used at server side,
⎡N / k ⎤ users are assigned to each bit.
Ò  Each tuple is received at least ⎡N / k ⎤ users.
Ò 

Alice(uid = 1)	
 1 0 0 0 0 1 1

(uid mod 2) + 1	

Ò 

N: Total number of user	

hash(uid)	

To avoid result analysis, we could introduce
⎡N / k ⎤ - anonymity.
23
iDB Forum 2008	
 2008/9/22	

AGENDA	
1. 
2. 
3. 
4. 
5. 

Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work	

24
iDB Forum 2008	
 2008/9/22	

CONCLUSION AND FUTURE WORK	
Ò 
Ò 
Ò 

Ò 

We introduce a new problem for outsourced DB
Social information means the relationships among user’s accounts
To protect the social information, we introduce two method
É  Query Generalization by Dynamic Hash
É  Result Generalization by Bloom Filter
Future work
É  Implement our methods and apply them to real service.
É  Evaluate our methods.

25
iDB Forum 2008	
 2008/9/22	

Thank you!	

26

Securing Social Information from Query Analysis in Outsourced Databases

  • 1.
    SECURING SOCIAL INFORMATION FROMQUERY ANALYSIS IN OUTSOURCED DATABASES Junpei Kawamoto and Masatoshi Yoshikawa (Kyoto University)
  • 2.
    iDB Forum 2008 2008/9/22 AGENDA 1.  2.  3.  4.  5.  Security problems of outsourced databases The data encryption model we employ Query Generalization by Dynamic Hash Result Generalization by Bloom Filter Conclusion and future work 2
  • 3.
    iDB Forum 2008 2008/9/22 AGENDA 1.  2.  3.  4.  5.  Security problems of outsourced databases The data encryption model we employ Query Generalization by Dynamic Hash Response Generalization by Bloom Filter Conclusion and future work 3
  • 4.
    iDB Forum 2008 2008/9/22 BACKGROUND OF OUR RESEARCH Ò  Outsourced Database systems are in widely use We can delegate management to service providers É  We can save on management costs É  We do not accept requests from outside of local networks É  We can protect the security of local networks É  www Traditional database system www Outsourced database system 4
  • 5.
    iDB Forum 2008 2008/9/22 PROBLEMS Ò  Security of users’ data Service Provider has total authority of databases É  Service Provider can inspect users’ data É  Data encryption at client side gives us a solution to the problem Ò  Security of users’ social information Social information means the relationships among user’s accounts 5
  • 6.
    iDB Forum 2008 2008/9/22 WHY IS SOCIAL INFORMATION IMPORTANT? Ò  Two kinds of risk emerge from the social information leak Personal information could be compromised in a chain effect É  Different online personae could be connected É  6
  • 7.
    iDB Forum 2008 2008/9/22 WHY IS SOCIAL INFORMATION IMPORTANT? Ò  Personal information could be compromised in a chain effect Bob do not compromise his information. However server can guess his name. ID: lee ID: carol ID: alice_gc Alice compromised her personal information blunderingly. Name: Alice, Boss: Bob, etc. In a chain effect, server may be able to guess other users information. 7
  • 8.
    iDB Forum 2008 2008/9/22 WHY IS SOCIAL INFORMATION IMPORTANT? Ò  Different online personae are connected Servers could guess these login names are associated with Alice. ID: alice_gc Public (official) account Other people’s account ID: Cheshire Private account 8
  • 9.
    iDB Forum 2008 2008/9/22 OUR PROPOSAL METHODS Ò  Query Generalization by Dynamic Hash Ò  Result Generalization by Bloom Filter 9
  • 10.
    iDB Forum 2008 2008/9/22 AGENDA 1.  2.  3.  4.  5.  Security problems of outsourced databases The data encryption model we employ Query Generalization by Dynamic Hash Response Generalization by Bloom Filter Conclusion and future work 10
  • 11.
    iDB Forum 2008 2008/9/22 DATA ENCRYPTION MODEL Ò  We use a traditional data encryption model É  Original table (not encrypted) No . name 1 Products Review 7/17 15:00 7/17 18:00 Alice, Bob 2 Business Trip 7/18 10:00 7/20 18:00 Alice, Bob 3 É  begin Who has authority end Team Meeting 7/20 15:00 7/20 18:00 Alice, Carol 4 Business Trip 7/21 12:00 Encrypted table (with index) No etuple . Iname Ibegin Iend acl 7/21 17:00 Dave Etuple is encrypted original tuple etuple = Encrypt(name, begin, end, acl) 1 5f0f1f46... 00 10 10 2 b98009af... 01 00 11 3 082ba604... 10 11 11 4 8bc546af... 01 01 01 Iname, Ibegin, and Iend are hash index used for query processing Iname = Hash(name) 11
  • 12.
    iDB Forum 2008 2008/9/22 DATA ENCRYPTION MODEL Ò  Query processing on the data encryption model Name = “Business Trip” & Begin = “7/18 10:00” Original query Iname = “01” & IBegin = “00” Query on server Name = “Business Trip” & Begin = “7/18 10:00” Query on client No . name begin end 2 Business Trip 7/18 10:00 7/20 18:00 4 Business Trip 7/21 12:00 7/21 17:00 etuple Iname Ibegin Iend b98009af... 01 00 11 8bc546af... 01 01 01 12
  • 13.
    iDB Forum 2008 2008/9/22 AGENDA 1.  2.  3.  4.  5.  Security problems of outsourced databases The data encryption model we employ Query Generalization by Dynamic Hash Response Generalization by Bloom Filter Conclusion and future work 13
  • 14.
    iDB Forum 2008 2008/9/22 QUERY GENERALIZATION Ò  Servers can guess users’ relationship by queries. If only Alice and Bob sent this query, server can guess they have some relation. Alice Bob Carol SELECT * FROM schedule WHERE begin = “7/15 10:00” SELECT * FROM schedule WHERE begin = “7/14 10:00” SELECT * FROM schedule WHERE begin = “7/15 10:00” or “7/14 10:00” This query is requested by at least three users so that server cannot find group information. 14
  • 15.
    iDB Forum 2008 2008/9/22 HOW TO GENERALIZE Ò  Firstly, queries are described by hash indices Begin = 7/15 10:00 Begin = 7/14 10:00 Ò  IBegin = 0001011 IBegin = 0011110 Next, the query is translated before it send to DB. www DB Generalizer Organization’s network 15
  • 16.
    iDB Forum 2008 2008/9/22 HOW TO GENERALIZE Ò  Generalizer uses dynamic hash to translate queries Translated query is IBegin = 00***** 0 0001011 0011110 1 0100101 0110110 IBegin = 01***** 1010011 1101101 1100011 IBegin = 1****** 0 1 *: wild card node leaf Begin = 7/15 10:00 IBegin = 00***** Begin = 7/14 10:00 IBegin = 00***** 16
  • 17.
    iDB Forum 2008 2008/9/22 SPLITTING LEAF Ò  Leaves are split to keep the distribution of hash balanced Insert new hash : 1000110 0 0001011 0011110 1 0100101 0110110 0 1 1010011 1101101 1100011 1000110 0 0001011 0011110 1 0100101 0110110 0 1 0 1010011 1000110 1 1101101 1100011 So that, moderate size hashes could be mixed. 17
  • 18.
    iDB Forum 2008 2008/9/22 AGENDA 1.  2.  3.  4.  5.  Security problems of outsourced databases The data encryption model we employ Query Generalization by Dynamic Hash Result Generalization by Bloom Filter Conclusion and future work 18
  • 19.
    iDB Forum 2008 2008/9/22 RESULT GENERALIZATION Ò  Servers can guess users’ relationship by query result. user tuple Alice tuple1 Bob tuple2 Carol tuple3 Dave tuple4 If only Alice and Carol request the tuple2, servers can guess there are some relationships between them If Alice and Dave never request same tuples, servers can guess there are no relationship between them To prevent servers’ guessing, some irrelevant tuples are requested. 19
  • 20.
    iDB Forum 2008 2008/9/22 RESULT GENERALIZATION Ò  User’s queries are generalized Original query Generalized query www user DB tuple Alice tuple1 Bob tuple2 Carol tuple3 Dave tuple4 In generalized query, each tuple is received by three users. Servers cannot guess relationships from the information of result tuples 20
  • 21.
    iDB Forum 2008 2008/9/22 BLOOM FILTER BASED GENERALIZATION Ò  Users are described by k+n length bit string Alice(uid = 1) 1 0 0 0 0 1 1 Bob(uid = 2) 0 1 1 1 0 0 0 Carol(uid = 3) 1 0 1 0 1 0 0 (uid mod 2) + 1 Ò  hash(uid) Dave(uid = 4) 0 1 0 0 1 0 1 Access authority is logical disjunction of the bit strings If Alice and Bob have authority: 1 1 1 1 0 1 1 If Alice and Carol have authority: 1 0 1 0 1 1 1 In this example, k =2, n = 5 Alice 1 0 0 0 0 1 1 ∨ Bob 0 1 1 1 0 0 0 21
  • 22.
    iDB Forum 2008 2008/9/22 QUERY PROCESSING Ò  When user requests tuples, bit string is used A tuple Alice & Bob have authority to 1111011 1111011 Alice 1000011 10 00011 Bob 0111000 01 11000 Carol 1010100 10 10100 Dave 0100101 01 00101 Server side At server side, first k bits are used Client side At client side, last n bits are used 22
  • 23.
    iDB Forum 2008 2008/9/22 ANONYMITY In first k bits used at server side, ⎡N / k ⎤ users are assigned to each bit. Ò  Each tuple is received at least ⎡N / k ⎤ users. Ò  Alice(uid = 1) 1 0 0 0 0 1 1 (uid mod 2) + 1 Ò  N: Total number of user hash(uid) To avoid result analysis, we could introduce ⎡N / k ⎤ - anonymity. 23
  • 24.
    iDB Forum 2008 2008/9/22 AGENDA 1.  2.  3.  4.  5.  Security problems of outsourced databases The data encryption model we employ Query Generalization by Dynamic Hash Response Generalization by Bloom Filter Conclusion and future work 24
  • 25.
    iDB Forum 2008 2008/9/22 CONCLUSION AND FUTURE WORK Ò  Ò  Ò  Ò  We introduce a new problem for outsourced DB Social information means the relationships among user’s accounts To protect the social information, we introduce two method É  Query Generalization by Dynamic Hash É  Result Generalization by Bloom Filter Future work É  Implement our methods and apply them to real service. É  Evaluate our methods. 25
  • 26.
    iDB Forum 2008 2008/9/22 Thank you! 26