Your SlideShare is downloading. ×
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Security of Social Information from Query Analysis in DaaS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Security of Social Information from Query Analysis in DaaS

179

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
179
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Security of Social Informationfrom Query Analysis in DaaS
    Junpei Kawamotoand Masatoshi Yoshikawa
    Kyoto University, Japan
  • 2. Database as a Service
    One of the component of the cloud computing
    Data are stored and managed by service providers
    The DaaS brings down a risk of compromise
    Paris
    London
    Bob
    Tokyo
    Alice
    DaaS Server
    Carol
  • 3. Database as a Service
    There are studies to guarantee the safety
    Security of data stored in the servers
    Preventing guess of data from query analyses
    Protecting personal information from query analyses
    Name, Age, …
    DaaSServer
    Name, Age, …
    Is it enough for the compromise?
  • 4. Overview of this presentation
    Name, Age, …
    friend
    1. We introduce a new problem – Social Information – That is relational information
    Name, Age, …
    co-worker
    2. We discuss an attack modelThat extracts the social information from query log
    DaaSServer
    Alice
    DaaS Server
    They seem to have a relation
    What's the schedule at 3:00pm, March 6th in “room A”?
    3. We propose a method protecting social info. from query analysis
    Conversion
    Server
    match(binary(hash(Where)), “01*”)
  • 5. What is Social Information?
    Social information
    is information about users’ relation
    That is NOT personal information
    So that is not protected by any rows in Japan
    Risks
    The structure of users’ org. can be extracted
    Strength of relations may indicate interests of the org.
    friend
    co-worker
    Bob
    Alice
    executive
    Carol
    Paris
    Tokyo
    London
    Next, I will introduce the attack model for this social information.
  • 6. An assumption for our attack model
    Users who send same characteristic queries have a relation.
        e.g. Users who request the event at particular date and time.
    What's the schedule at 3:00pm, March 6th in “room A”?
    Bob
    Alice
    What's the schedule at 3:00pm, March 6th in “room C”?
    DaaS Server
    What's the schedule at 3:00pm, March 6th in “room A”?
    Carol
    We presuppose they have a same interest, therefore have a relation
  • 7. Attackers can obtain the query log in servers.
    That is described as the below table
    Attack model
    What's the schedule at 3:00pm, March 6th in “room A”?
    Date = 0306, Time = 1500, Where = Room A
    Alice
    Bob
    What's the schedule at 3:00pm, March 6th in “room C”?
    Date = 0306, Time = 1500, Where = Room C
    DaaS Server
    What's the schedule at 3:00pm, March 6th in “room A”?
    Date = 0306, Time = 1500, Where = Room A
    Carol
    To compute the similarity between the users, attacker calculate query feature vectors in this model
  • 8. Query feature vector
    Calculating literal frequencies
    Normalize
    each values are divided by the number of request of the user
    Room A
    1600
    Room B
    1500
    1700
    Room C
    0306




    1
    1
    1
    2
    1
    1
    3
    1
    2
    0
    33
    22
    22
    13
    13
    13
    13
    23
    12
    12
    12
    12
    12
    12
    22




    1
    1
    1
    2
    1
    1
    0
    0




    1
    1
    1
    2
    1
    2
    0
    0
    0
    Query feature vector
  • 9. Compute Similarities
    We define the cosine value as the similarity
    If sim(u, v) is greater than threshold θ
    it is judged that user u and v have a relation
    (QVu: Query vector of user u)
    33
    13
    13
    13
    13
    23
    22
    12
    12
    12
    12
    22
    12
    12
    22
    Next, I will explain the basic scenario of our approach to prevent from this attack.




    1
    1
    1
    2
    1
    1
    3
    1
    2
    0
    Alice
    Sim(Alice, Bob) =
    Bob




    1
    1
    1
    2
    1
    1
    0
    0




    1
    1
    1
    2
    1
    2
    0
    0
    0
    Carol
  • 10. Basic Scenario of Query Conversion
    Paris
    London
    Next, I will introduce how one conversion server works.
    How all servers collaborate with each other is a future work.
    ConversionServer
    Bob
    Alice
    Tokyo
    DaaS Server
    To remove the feature from queries received by the server,
    • we use the conversion server on each trusted networks
    • 11. the server works between users and the DaaS server
    Carol
    means a trusted network such as a local network in business places
  • 12. Query Conversion Tree
    We introduce a conversion tree to convert queries
    That is based on the extendible hashing†
    It is a binary tree and leaf nodes have strings
    Each edge has a label (0 or 1)
    Inner node
    Leaf node
    A
    0
    root
    Node A has0010101, 000101, …
    2
    0
    1
    1
    B
    1
    C
    †R. Fagin, J. Nievergelt, N. Pippenger, and H. R. Strong. Extendible hashing - a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3):315344, 1979.
  • 13. A user asks schedules and sends the query
    Let me show how to convert “Where = room A”
    Conversion Process
    1: Hash the literal of the query: hash(“room A”) = 3
    2: Convert the hash value into the binary string:
    binary(hash(“room A”)) = “0110”
    3: Convert the binary string with the conversion tree:
    Alice
    DaaS Server
    What's the schedule at 3:00pm, March 6th in “room A”?
    Conversion Server
    Date = 0306, Time = 1500, Where = “room A”
    Date = 0306, Time = 1500, Where = “room A”
  • 14. Convert the binary string with the tree
    The conversion start from the root node
    Compare the 1st character of the binary string with labels
    Compare the next character with labels from the node #2
    Continue the step 3 until reaching a leaf node
    Inner node
    Leaf node
    Binary string: 0110
    A
    0
    root
    0
    2
    1
    Connect the labels from rootto the mapped leaf node: 01
    1
    B
    2
    1
    1
    C
    Append a wild-card character *: 01*
    Converted query
  • 15. A user asks schedules and sends the query
    Let me show how to convert “Date = 0306”
    Conversion Process
    1: Hash the literal of the query: hash(“room A”) = 3
    2: Convert the hash value into the binary string:
    binary(hash(“room A”)) = “0110”
    3: Convert the binary string with the conversion tree: 01*
    4: Finally create the new query:
    match(binary(hash(Where)), “01*”)
    Alice
    DaaS Server
    What's the schedule at 3:00pm, March 6th in “room A”?
    Conversion Server
    Date = 0306, Time = 1500, Where = “room A”
    Date = 0306, Time = 1500, Where = “room A”
    match(binary(hash(Where)), “01*”)
  • 16. Summary of the conversion
    match(binary(hash(Where)), “01*”) is the final query
    * is a wild-card character
    match is a function to compare binary strings with queries
    The original query is “Where = room A”
    Result of the conversion
    Any queries starting with “01” is converted to “01*”
    No one can distinguish the original queries
    binary(hash(“room A”)) = “0110”
    binary(hash(“room X”)) = “0100”
    match(binary(hash(Where)), “01*”)
    binary(hash(“abc cafe”)) = “0101”
    Next, I will explain the method updating conversion tree to reduce costs.
  • 17. Updating Conversion Tree
    Some irrelevant data are obtained by the conversion
    We define the cost as the number of datawhich user u has to obtain when s/he request a querymapped the leaf node n
    To reduce the above cost under the given cmax,
    We update conversion tree
    max allowable cost
  • 18. Updating Process (1 of 2)
    Target node n is chosen in order of the frequency
    The literals included in the node is divided 2 sets
    Where d isthe depth of the target node (1origin)
    The set of nodes Ls is divided whether the d-th character is 0 or not
    Leaf node n has:
    1000, 1001, 1010, 1011
    1100, 1101, 1110, 1111
    (for easily, let us think only 4 bits)
    Ls0
    n:Ls
    1000, 1001, 1010, 1011
    1000, 1001, 1010, 1011
    1100, 1101, 1110, 1111
    0
    Ls1
    root
    1100, 1101, 1110, 1111
    0
    2
    1
    1
    1
    n:Ls
  • 19. Updating Process (2 of 2)
    Compute the following to 2 sets (Ls0 and Ls1)
    If cost0or cost1 are greater than cmax
    Delete the node Ls then add a new node and 2 new leaves
    Count(u, l) is how many user u inquires by literal l
    totalu is the # of inquiry of user u
    max allowable cost
    0
    Ls0
    root
    1000, 1001, 1010, 1011
    0
    2
    1
    1
    Ls1
    1
    n:Ls
    1100, 1101, 1110, 1111
    1000, 1001, 1010, 1011
    1100, 1101, 1110, 1111
    3
    Next, I will talk about the evaluation.
  • 20. Evaluation Experiment
    We have selected a dataset used by Alexander et al.†
    This dataset is constructed from Open Directory
    It contains users’ groups and queries.
    †Alexander Löser, Steffen Staab, and ChristophTempich: “Semantic Methods for P2P Query Routing”, Multiagent System Technologies(MATES2005)
  • 21. The # of users is 133,602 and the # of groups is 6,280
    The precision and recall of the attack are
    Result
    Introducing attack model can extract users relation in high precision
    The higher precision is, the higher risk is
  • 22. Result
    How much does the query conversion reducethe precision.
    0.8 -> 0.55
  • 23. Result
    How much does the query conversion reducethe recall
  • 24. Summary
    friend
    1. We introduce a new problem on DaaS – Social Information –
    co-worker
    2. We introduce an attack modelThat extracts the social information from query log
    DaaSServer
    Alice
    DaaS Server
    They seem to have high relation
    What's the schedule at 3:00pm, March 6th in “room A”?
    3. We propose a method protecting social info. from query analysis
    Conversion
    Server
    match(binary(hash(Where)), “01*”)

×