Talk by Artem Storozhuk at NoNameCon 2019.
https://nonamecon.org
https://cfp.nonamecon.org/nnc2019/talk/NUMHDY/
The search over encrypted data is the modern cryptographic engineering problem. We will talk about existing approaches (both well-known and modern), and concentrate on practical solution based on blind index technique to search data in databases. What’s inside: cryptographic and functional schemes, implementation details, practical security evaluation (risk modelling and potential attacks). We will show how theoretical models turn into real, usable, maintainable, security tools.
Lately most conscious companies store data in databases encrypted, but search over encrypted data is still a challenge. There are many existing academic solutions, proposed over the course of years, like CryptDB, Homomorphic/SSE, PEKS, Mylar. Unfortunately, most approaches are far from being production ready, usable and maintainable.
We will show the practical solution, that is based on a hardened version of blind indexing, a long-known technique that has several usability constraints and security caveats. There is an open source implementation CipherSweet, and cryptographically it’s pretty solid, but it stores keys on a client side, which may lead to potential problems during usage.
Our solution doesn't share this design approach, since the generation of index references and keys to them are stored in a separate node, away from all untrusted sides (client application, backend application, database). Also, our solution enforces several limitations on data, which is going to limit collision risks mentioned in the original technique.
We will explain in details how it works, show the functional and cryptographic schemes, and dig into implementation details. We will show to the attendees the process of building complex security tool from theoretical concepts (and mathematical models) to production-ready software.
10. Index-based searchable encryption
I - secure index (pointer on encrypted message);
T - trapdoor (allows server to identify encrypted message without revealing its plaintext);
13. Searchable encryption security
Information about objects that may be leaked:
1) Order
2) Equalities
3) Predicates
4) Identifiers
5) Structure
Groups of leakage:
1) Secure index metadata
2) Search pattern
3) Access pattern
14. Model of untrusted storage provider:
1) Honest-but-curious
2) Malicious
Searchable encryption security
Information about objects that may be leaked:
1) Order
2) Equalities
3) Predicates
4) Identifiers
5) Structure
Groups of leakage:
1) Secure index metadata
2) Search pattern
3) Access pattern
15. Model of untrusted storage provider:
1) Honest-but-curious
2) Malicious
Searchable encryption security
Information about objects that may be leaked:
1) Order
2) Equalities
3) Predicates
4) Identifiers
5) Structure
Strongest security definition (Curtmola et. al. 2006) [schemes exist only in theory]:
Nothing should be leaked.
Full security definition (Shen et. al. 2009) [schemes exist with implementation but inefficient in
production]:
Nothing should be leaked, except access pattern.
Groups of leakage:
1) Secure index metadata
2) Search pattern
3) Access pattern
17. Count Attack – 40% keyword recovery rate with a 80% of
dataset known to attacker.
Works well if the keyword universe sizes is 5000 at most.
Leakage inference attacks
18. Count Attack – 40% keyword recovery rate with a 80% of
dataset known to attacker.
Works well if the keyword universe sizes is 5000 at most.
Leakage inference attacks
Hierarchical-Search Attack – extension of the Count Attack,
40% keyword recovery rate under a condition that (at least)
40% of the data leaks.
Attacker could inject a set of constructed records.
19. 1. open source
2. strong & proven
3. fast & reliable
4. without security design flaws
How we selected SE scheme?
27. CipherSweet
MAC length <==> Probability of index collision <==>
Probability of
“false positives” in
SELECT response
28. CipherSweet
MAC length <==> Probability of index collision <==>
Probability of
“false positives” in
SELECT response
Application Database
FieldA FieldB
ENCRYPTED ...
ENCRYPTED ...
FieldA FieldB
0x0123456 ...
0x0125676 ...
34. AcraSE cryptographic design
INSERT query transparent mode
insert into test_table(A, B) values (<plaintext>, <plaintext>)
changed to
insert into test_table(A, B) values (<mac><ciphertext>, <mac><ciphertext>)
INSERT query standard mode
insert into test_table(A, B) values (<ciphertext>, <ciphertext>)
changed to
insert into test_table(A, B) values (<mac><ciphertext>, <mac><ciphertext>)
35. AcraSE cryptographic design
SELECT query
select * from test_table where A=<plaintext>
changed to
select * from test_table where substring("A" from 1 for MAC_BYTE_LEN)=<mac>
38. Future work
1) Secure Index truncation and false positives filtering.
2) Performance evaluation.
3) Extension of query expressiveness.
4) Data entropy learning.
github.com/cossacklabs/acra
39. Conclusions
1) Searchable encryption is modern and not completely
stable.
2) There is a lack of existing SQL solutions.
3) Secure (blind) indexing approach is the one of reliable
techniques for building secure SE schemes.