This document discusses privacy-enhanced search techniques. It begins by motivating the need for searching encrypted or remote data without revealing sensitive information to untrusted providers. It then outlines several approaches for private information retrieval and searching directly on encrypted data, including techniques using bloom filters and searchable symmetric encryption. Key properties and examples of each technique are provided.
4. REMOTE / UNTRUSTED
STORAGE
• What if you don’t trust the storage provider ?
• Encrypt
• What if you want to use a search provider
but don’t trust them ?
• What if you want to search your encrypted data ?
5. WHAT IFYOU WANTTO SEARCH
YOUR ENCRYPTED DATA ?
Naïve approach: Server sends you everything
12. ADDRESS BOOK MATCHING
Better approach
• Hash your data. Like WhatsApp, or Gravatar.
• Still guessable (e-mail addresses)
• Gravatar tracking
• Still pre-computable (phone numbers)
• Steal the database or match what you like
13. ADDRESS BOOK MATCHING
Hash (social) connections
• My phone number m, friend's number f
• Hash: h(min(m, f), max(m, f))
• Both ends must have the other contact in the
address book to match
• Anybody can confirm your connections
14. ADDRESS BOOK MATCHING
Hash (phone # | e-mail) || (first | last name)
• Common names (e.g. John) still easily retrievable
• Users have to enter their own name
(besides phone no.) for others to find them
• Contacts must contain first name & last name
15. ADDRESS BOOK MATCHING
BLOOM FILTERS
Setup
• Compute m-bit vector from k independent hash
functions with range [1…m] of all entries to match
• Hashes need not be cryptographically secure,
just independent
16. ADDRESS BOOK MATCHING
BLOOM FILTERS
1
1
1
h1(p) = i1
h2(p) = i2
h3(p) = i3
h4(p) = i4
1
……………
position i3
m bits
17. ADDRESS BOOK MATCHING
BLOOM FILTERS
Properties:
• Never any false negatives
• n insertions
• Probability of bit = 0: (1 - 1/m)kn
• False positive rate: (1 - e-kn/m)k
18. KEYWORD SEARCH
SEARCHABLE SYMMETRIC KEY ENCRYPTION
Properties:
• Probabilistic search
• False positives with probability 1/2
m
per word, i.e.
L/2
m
for a document with L words
• n insertions
• Probability of bit being zero: (1 - 1/m)
kn
• False positive rate: (1 - e
-kn/m
)
k
19. SSKE
BASIC SCHEME
Setup
• Break document into L words W1...WL, either with
• n bits (padded; leaks word count) or
• with length information (leaks word & document lengths)
• PRG (stream cipher with key k' that only client knows)
• S1...SL with (n - m) bits each
• Keyed PRF Fki(x) maps (n - m) bits to m bits
W1 W2 Wi WL… …
20. SSKE
BASIC SCHEME
Setup
• Ti := Si || Fki(Si)
• Ciphertext Ci := Wi ⊕ Ti
• Send encrypted document to server
Si Fki(Si)
Wi
⊕ Ci
C1 C2 Ci CL… …
22. SSKE
BASIC SCHEME
Search for keyword wj
• Server computes Ci ⊕ wj
• If Ci ⊕ wj = s || Fki(s), yield s for all locations i
• Client can decrypt s and check for false positives
23. SSKE
BASIC SCHEME
Problems
• Linear search effort, inefficient for real-world
documents with different word lengths
• Client reveals ki of searched subset and wj
24. SSKE
BASIC SCHEME
Improvement
• Use PRG G to generate ki := GK(Wi), K secret key
• Does not depend on i but only on K and Wi
• Reveal wj and GK(wj) for lookup
• Still reveals keyword wj
25. SSKE
BASIC SCHEME
Second improvement: Setup
• Encrypt all words in document xi := Esk(Wi)
• Split each word xi into Li with (n - m) and Ri with m
bits
• Now generate ki := GK(Li)
• Ci := xi ⊕ Ti