Security of Social Information from Query Analysis in DaaS

320 views
264 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
320
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Security of Social Information from Query Analysis in DaaS

  1. 1. Security of Social Informationfrom Query Analysis in DaaS<br />Junpei Kawamotoand Masatoshi Yoshikawa<br />Kyoto University, Japan<br />
  2. 2. Database as a Service<br />One of the component of the cloud computing<br />Data are stored and managed by service providers<br />The DaaS brings down a risk of compromise<br />Paris<br />London<br />Bob<br />Tokyo<br />Alice<br />DaaS Server<br />Carol<br />
  3. 3. Database as a Service<br />There are studies to guarantee the safety<br />Security of data stored in the servers<br />Preventing guess of data from query analyses<br />Protecting personal information from query analyses<br />Name, Age, …<br />DaaSServer<br />Name, Age, …<br />Is it enough for the compromise?<br />
  4. 4. Overview of this presentation<br />Name, Age, …<br />friend<br />1. We introduce a new problem – Social Information – That is relational information<br />Name, Age, …<br />co-worker<br />2. We discuss an attack modelThat extracts the social information from query log<br />DaaSServer<br />Alice<br />DaaS Server<br />They seem to have a relation<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />3. We propose a method protecting social info. from query analysis<br />Conversion<br />Server<br />match(binary(hash(Where)), “01*”)<br />
  5. 5. What is Social Information?<br />Social information<br />is information about users’ relation<br />That is NOT personal information<br />So that is not protected by any rows in Japan<br />Risks<br />The structure of users’ org. can be extracted<br />Strength of relations may indicate interests of the org.<br />friend<br />co-worker<br />Bob<br />Alice<br />executive<br />Carol<br />Paris<br />Tokyo<br />London<br />Next, I will introduce the attack model for this social information.<br />
  6. 6. An assumption for our attack model<br />Users who send same characteristic queries have a relation.<br />    e.g. Users who request the event at particular date and time.<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />Bob<br />Alice<br />What&apos;s the schedule at 3:00pm, March 6th in “room C”?<br />DaaS Server<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />Carol<br />We presuppose they have a same interest, therefore have a relation<br />
  7. 7. Attackers can obtain the query log in servers.<br />That is described as the below table<br />Attack model <br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />Date = 0306, Time = 1500, Where = Room A<br />Alice<br />Bob<br />What&apos;s the schedule at 3:00pm, March 6th in “room C”?<br />Date = 0306, Time = 1500, Where = Room C<br />DaaS Server<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />Date = 0306, Time = 1500, Where = Room A<br />Carol<br />To compute the similarity between the users, attacker calculate query feature vectors in this model<br />
  8. 8. Query feature vector<br />Calculating literal frequencies<br />Normalize<br />each values are divided by the number of request of the user<br />Room A<br />1600<br />Room B<br />1500<br />1700<br />Room C<br />0306<br />…<br />…<br />…<br />…<br />1<br />1<br />1<br />2<br />1<br />1<br />3<br />1<br />2<br />0<br />33<br />22<br />22<br />13<br />13<br />13<br />13<br />23<br />12<br />12<br />12<br />12<br />12<br />12<br />22<br />…<br />…<br />…<br />…<br />1<br />1<br />1<br />2<br />1<br />1<br />0<br />0<br />…<br />…<br />…<br />…<br />1<br />1<br />1<br />2<br />1<br />2<br />0<br />0<br />0<br />Query feature vector<br />
  9. 9. Compute Similarities<br />We define the cosine value as the similarity<br />If sim(u, v) is greater than threshold θ<br />it is judged that user u and v have a relation<br />(QVu: Query vector of user u)<br />33<br />13<br />13<br />13<br />13<br />23<br />22<br />12<br />12<br />12<br />12<br />22<br />12<br />12<br />22<br />Next, I will explain the basic scenario of our approach to prevent from this attack.<br />…<br />…<br />…<br />…<br />1<br />1<br />1<br />2<br />1<br />1<br />3<br />1<br />2<br />0<br />Alice<br />Sim(Alice, Bob) = <br />Bob<br />…<br />…<br />…<br />…<br />1<br />1<br />1<br />2<br />1<br />1<br />0<br />0<br />…<br />…<br />…<br />…<br />1<br />1<br />1<br />2<br />1<br />2<br />0<br />0<br />0<br />Carol<br />
  10. 10. Basic Scenario of Query Conversion <br />Paris<br />London<br />Next, I will introduce how one conversion server works.<br />How all servers collaborate with each other is a future work.<br />ConversionServer<br />Bob<br />Alice<br />Tokyo<br />DaaS Server<br />To remove the feature from queries received by the server,<br /><ul><li> we use the conversion server on each trusted networks
  11. 11. the server works between users and the DaaS server</li></ul>Carol<br />means a trusted network such as a local network in business places<br />
  12. 12. Query Conversion Tree<br />We introduce a conversion tree to convert queries <br />That is based on the extendible hashing†<br />It is a binary tree and leaf nodes have strings<br />Each edge has a label (0 or 1)<br />Inner node<br />Leaf node<br />A<br />0<br />root<br />Node A has0010101, 000101, …<br />2<br />0<br />1<br />1<br />B<br />1<br />C<br />†R. Fagin, J. Nievergelt, N. Pippenger, and H. R. Strong. Extendible hashing - a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3):315344, 1979.<br />
  13. 13. A user asks schedules and sends the query<br />Let me show how to convert “Where = room A”<br />Conversion Process<br />1: Hash the literal of the query: hash(“room A”) = 3<br />2: Convert the hash value into the binary string:<br />binary(hash(“room A”)) = “0110”<br />3: Convert the binary string with the conversion tree:<br />Alice<br />DaaS Server<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />Conversion Server<br />Date = 0306, Time = 1500, Where = “room A”<br />Date = 0306, Time = 1500, Where = “room A”<br />
  14. 14. Convert the binary string with the tree<br />The conversion start from the root node<br />Compare the 1st character of the binary string with labels<br />Compare the next character with labels from the node #2<br />Continue the step 3 until reaching a leaf node<br />Inner node<br />Leaf node<br />Binary string: 0110<br />A<br />0<br />root<br />0<br />2<br />1<br />Connect the labels from rootto the mapped leaf node: 01<br />1<br />B<br />2<br />1<br />1<br />C<br />Append a wild-card character *: 01*<br />Converted query<br />
  15. 15. A user asks schedules and sends the query<br />Let me show how to convert “Date = 0306”<br />Conversion Process<br />1: Hash the literal of the query: hash(“room A”) = 3<br />2: Convert the hash value into the binary string:<br />binary(hash(“room A”)) = “0110”<br />3: Convert the binary string with the conversion tree: 01*<br />4: Finally create the new query:<br />match(binary(hash(Where)), “01*”)<br />Alice<br />DaaS Server<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />Conversion Server<br />Date = 0306, Time = 1500, Where = “room A”<br />Date = 0306, Time = 1500, Where = “room A”<br />match(binary(hash(Where)), “01*”)<br />
  16. 16. Summary of the conversion<br />match(binary(hash(Where)), “01*”) is the final query<br />* is a wild-card character<br />match is a function to compare binary strings with queries<br />The original query is “Where = room A”<br />Result of the conversion<br />Any queries starting with “01” is converted to “01*”<br />No one can distinguish the original queries<br />binary(hash(“room A”)) = “0110”<br />binary(hash(“room X”)) = “0100”<br />match(binary(hash(Where)), “01*”)<br />binary(hash(“abc cafe”)) = “0101”<br />Next, I will explain the method updating conversion tree to reduce costs.<br />
  17. 17. Updating Conversion Tree<br />Some irrelevant data are obtained by the conversion<br />We define the cost as the number of datawhich user u has to obtain when s/he request a querymapped the leaf node n<br />To reduce the above cost under the given cmax,<br />We update conversion tree<br />max allowable cost<br />
  18. 18. Updating Process (1 of 2)<br />Target node n is chosen in order of the frequency<br />The literals included in the node is divided 2 sets<br />Where d isthe depth of the target node (1origin)<br />The set of nodes Ls is divided whether the d-th character is 0 or not<br />Leaf node n has:<br />1000, 1001, 1010, 1011<br />1100, 1101, 1110, 1111<br />(for easily, let us think only 4 bits)<br />Ls0<br />n:Ls<br />1000, 1001, 1010, 1011<br />1000, 1001, 1010, 1011<br />1100, 1101, 1110, 1111<br />0<br />Ls1<br />root<br />1100, 1101, 1110, 1111<br />0<br />2<br />1<br />1<br />1<br />n:Ls<br />
  19. 19. Updating Process (2 of 2)<br />Compute the following to 2 sets (Ls0 and Ls1)<br />If cost0or cost1 are greater than cmax<br />Delete the node Ls then add a new node and 2 new leaves<br />Count(u, l) is how many user u inquires by literal l<br />totalu is the # of inquiry of user u<br />max allowable cost<br />0<br />Ls0<br />root<br />1000, 1001, 1010, 1011<br />0<br />2<br />1<br />1<br />Ls1<br />1<br />n:Ls<br />1100, 1101, 1110, 1111<br />1000, 1001, 1010, 1011<br />1100, 1101, 1110, 1111<br />3<br />Next, I will talk about the evaluation.<br />
  20. 20. Evaluation Experiment<br />We have selected a dataset used by Alexander et al.†<br />This dataset is constructed from Open Directory<br />It contains users’ groups and queries.<br />†Alexander Löser, Steffen Staab, and ChristophTempich: “Semantic Methods for P2P Query Routing”, Multiagent System Technologies(MATES2005)<br />
  21. 21. The # of users is 133,602 and the # of groups is 6,280<br />The precision and recall of the attack are<br />Result<br />Introducing attack model can extract users relation in high precision<br />The higher precision is, the higher risk is<br />
  22. 22. Result<br />How much does the query conversion reducethe precision.<br />0.8 -&gt; 0.55<br />
  23. 23. Result<br />How much does the query conversion reducethe recall<br />
  24. 24. Summary<br />friend<br />1. We introduce a new problem on DaaS – Social Information – <br />co-worker<br />2. We introduce an attack modelThat extracts the social information from query log<br />DaaSServer<br />Alice<br />DaaS Server<br />They seem to have high relation<br />What&apos;s the schedule at 3:00pm, March 6th in “room A”?<br />3. We propose a method protecting social info. from query analysis<br />Conversion<br />Server<br />match(binary(hash(Where)), “01*”)<br />

×