There are two worlds of crypto-development: industry and academia. Leading researcher Sasha Boldyreva shares her experience on how the two can have a mutually-beneficial collaboration through her work with Skyhigh Networks.
Searching Encrypted Cloud Data: Academia and Industry Done Right
1. Searching Encrypted Cloud Data:
Case Study on Academia + Industry (Done Right)
Alexandra (Sasha) Boldyreva
School of Computer Science in the College of Computing
Georgia Institute of Technology
3. TWO WORLDS OF CRYPTO DEVELOPMENT
Academia Industry
Ø
Why
are
the
two
worlds
so
disjointed?
Ø
Is
this
unavoidable?
4. TWO WORLDS, A CLOSER LOOK: ACADEMIA
Priorities in protocol design
Ø Competitiveness
Ø Can be published
Ø Novel
Ø Non-trivial, uses interesting ideas
Ø Provably-secure
Ø Uses novel useful technics
Ø Has impact on future research
5. TWO WORLDS, A CLOSER LOOK: INDUSTRY
Priorities in protocol design
Ø Competitiveness
Ø Can be sold
Ø Novel
Ø Useful
Ø Very efficient
Ø Legislation compliant
Ø Resists obvious attacks
6. TWO WORLDS, A CLOSER LOOK
Priorities in protocol design –
Let’s highlight the most importance differences:
ACADEMIA
INDUSTRY
Ø Novel
Ø Non-trivial, uses interesting ideas
Ø Provably-secure
Ø Uses novel useful technics
Ø Has impact on future research
Ø Novel
Ø Useful
Ø Very efficient
Ø Legislation compliant
Ø Resists obvious attacks
7. TWO WORLDS, A CLOSER LOOK
ACADEMIA
INDUSTRY
Ø Public
Ø Complex
Ø Not very efficient
Ø Rarely used
Ø Provably-secure (provide security
guarantees)
Ø Often proprietary
Ø Simpler
Ø Very efficient
Ø Solve real problems
Ø Used
Ø Security is not well understood
They lead to differences in schemes’ common properties …
8. TWO WORLDS, A CLOSER LOOK
How academics view
crypto products from
the industry …. Prac88oners
should
not
design
crypto
schemes
as
they
cannot
prove
their
schemes
secure.
They
should
use
our
schemes.
9. TWO WORLDS, A CLOSER LOOK
Provable security is a great methodology that allows us to
have schemes with security guarantees. However,
Ø the definitions (and proofs) are very often hard to understand and
judge how do they “match” reality;
Ø it is hard to have schemes which are provably-secure under well-
studied assumptions for strong definitions and are efficient.
10. TWO WORLDS, A CLOSER LOOK
How practitioners view
the work produced by
academia …
Academics
do
not
understand
what
is
needed
in
prac8ce.
What
they
call
“efficient”
and
“prac8cal”
are
not.
Their
papers
are
hard
to
understand.
Strong
security
is
a
hassle.
16. PRODUCT OVERVIEW
Skyhigh Networks' product allows customers to use the
existing cloud service providers with added security and
without losing functionality, e.g. search. Employ
Ø symmetric searchable encryption (can search for
encrypted keyword or sort encrypted data),
Ø format-preserving encryption
Ø …
17. 831 services used by an enterprise on average
CLOUD USE GROWING RAPIDLY
20. IMPORTANT CONSIDERATIONS
① Which
schemes
do
we
employ
and
how
do
we
shepherd
an
algorithm
from
concep8on
to
deployment?
② How
do
we
op8mize
cryptographic
schemes
without
invalida8ng
their
security
proofs?
③ In
what
situa8ons
is
it
appropriate
to
trade
security
for
func8onality
in
a
piece
of
commercial
soJware?
These
ques8ons
must
be
answered
before
a
new
algorithm
reaches
a
customer.
21. GENERAL CONCERNS WITH ENCRYPTION
No secure coding guidelines - hard to know what is
acceptable and when
Fighting misinformation in the market
• Consumers don’t understand security/usability tradeoffs
• They expect full security and full functionality for their data
Weighing tradeoffs between security and functionality
• When is it appropriate to have a weakened security guarantee
• For what kinds of data?
22. THE START OF GREAT COLLABORATION
Skyhigh reached out to me, as I was actively working on
protocols for efficiently-searchable encryption.
23. MY WORK ON SEARCHABLE ENCRYPTION
Georgios
Amana8dis
Nathan
CheneOe
Younho
Lee
Adam
O’Neill
Joint
effort
with
my
colleagues:
27. SECURE CLOUD STORAGE: GOALS
Three
goals:
security,
efficiency,
func8onality
Secure
Cloud
Server
(encrypted
database)
Client
(EncK ($72k), rec4)
(EncK ($68k), rec3)
(EncK ($95k), rec5)
(EncK ($35k), rec1)
(EncK ($50k), rec2)
Security
searchable
data
is
symmetrically
encrypted
Efficiency
server
responds
to
query
in
sub-‐
linear
8me
Func8onality
various
query
types,
data
updates,
…
28. EFFICIENT SEARCHABLE ENCRYPTION
¡ The
study
of
schemes
balancing
these
goals
is
efficient
searchable
encryp8on
(ESE)
§ Cryptographic
efforts
oJen
focus
on
strong
security
§ Prac88oners
wonder:
how
much
security
is
possible
without
sacrificing
efficient
func8onality?
¡ Efficiency,
security,
and
func8onality
are
at
odds
§ E.g.,
strong
encryp8on
requires
linear
search
8me
29. PAST RESULTS IN SEARCHABLE SYMMETRIC ENCRYPTION
Security
Efficiency
Func8onality
Oblivious
RAM
[GO96]
Excellent
Imprac8cal
All
query
types
Fully
homomorphic
encryp8on
[G09]
Excellent
Imprac8cal
All
query
types
Exact-‐match
SSE
[SLDHJ10,GO96,G09,33,CM05]
Great
Linear+
Exact-‐match
Exact-‐match
SSE
[CGKO06,SWP00,KO12]
Great
Sub-‐linear
Exact-‐match
No
dynamic
updates
Range-‐query
SSE
[BW07]
Great
Linear+
Range
Prefix-‐preserving
encryp8on
[KIK12,BBKN01,XFAM02]
Vulnerable
Sub-‐linear
Range;
specialized
implementa8on
Order-‐preserving
encryp8on
[AKSX04]
Undefined/
Unknown
Sub-‐linear
Range;
simple
to
implement
Efficient
fuzzy-‐searchable
encryp8on
[KIK12]
Undefined/
Unknown
Sub-‐linear
Error-‐tolerant
30. OUR GOALS
Provide
provably-‐secure
solu8ons
for
suppor8ng
efficient
(sublinear)
Ø exact-‐match
Ø range
Ø error-‐tolerant
search
on
encrypted
data
31. OUR RESULTS
Provide
provably-‐secure
solu8ons
for
suppor8ng
efficient
(sublinear)
• exact-‐match:
efficiently-‐searchable
encryp8on
[ABO07],
• range:
order-‐preserving
encryp8on
(OPE)
[BCLO09,BCO11],
• error-‐tolerant:
fuzzy-‐searchable
encryp8on
[BC14]
search
on
encrypted
data
33. ORDER-PRESERVING ENCRYPTION (OPE)
A
symmetric
encryp8on
scheme
is
order-‐preserving
if
encryp8on
is
determinis8c
and
strictly
increasing.
Example
OPE
func8on
for
K
$
KeyGen
EncK (·)
plaintexts
ciphertexts
34. ORDER-PRESERVING ENCRYPTION (OPE)
A
symmetric
encryp8on
scheme
is
order-‐preserving
if
encryp8on
is
determinis8c
and
strictly
increasing.
Example
OPE
func8on
for
K
$
KeyGen
EncK (·)
m1m0
EncK (m0)
EncK (m1)
35. ORIGINS OF OPE
Ø OPE
has
a
long
history
in
the
form
of
one-‐part
codes.
Ø In
a
one-‐part
code,
code
words
and
transla8ons
have
the
same
order
Ø To
encrypt
or
decrypt
requires
only
a
single
look-‐up
table
Ø More
recently,
[AKSX04]
suggested
OPE
as
a
protocol
to
support
range
queries
for
secure
cloud
storage.
36. EFFICIENT RANGE QUERIES VIA OPE
• Range
query
support
is
effortless
using
OPE
[AKSX04]
• Can
we
make
it
secure?
• Actually…
how
to
even
define
security?
Client
Server
(encrypted
database)
(EncK ($35k), rec1)
(EncK ($50k), rec2)
(EncK ($68k), rec3)
(EncK ($72k), rec4)
(EncK ($95k), rec5)
Range($40k, $68k)
Range(EncK ($40k), EncK ($68k))
{(EncK ($50k), rec2) , (EncK ($68k), rec3)}
38. TOWARDS OPE SECURITY MODEL
• OPE
cannot
be
IND-‐CPA
because
it
is
determinis8c.
• We
have
to
weaken
IND-‐CPA
defini8on.
39. ATTEMPT
#1: IND-DISTINCTCPA
• What
if
equality
paOerns
of
LEFT
and
RIGHT
queries
must
match?
• Suitable
for
determinis8c
encryp8on
• S8ll
unachievable
by
an
OPE
scheme,
because
order
is
leaked!
LEFT
L
oracle
(M0,M1)
EK(M0)
A
b
RIGHT
R
oracle
(M0,M1)
EK(M1)
A
b
M0
M0
M1
M1
EK(Mb)
EK(Mb)
LEFT
RIGHT
Ciphertexts
Query
pairs
EK(Mb)
EK(Mb)
Guess
b
=
1
*
*
*
*
Guess
b
=
0
40. ATTEMPT #2: IND-ORDEREDCPA
• What
if
order
paOerns
of
LEFT
and
RIGHT
queries
must
match?
LEFT
(M0,M1)
EK(M0)
A
b
RIGHT
(M0,M1)
EK(M1)
A
b
LEFT
RIGHT
Ciphertexts
Query
pairs
M0
M0
M0
M0
M1
M1
M1
M1
M1
EK(Mb)
EK(Mb)
EK(Mb)
EK(Mb)
Not
allowed!
M0
2
2
2
3
3
3
4
4
4
1
1
1
5
5
L
oracle
R
oracle
41. ATTEMPT #2: IND-ORDEREDCPA
• In
fact,
there
is
s8ll
a
general
aOack
against
any
OPE
scheme[BCLO09].
• Demonstrates
that
OPE
must
leak
rela8ve
distance
of
plaintexts.
42. A DIFFERENT APPROACH TO SECURITY
• Instead
of
trying
to
relax
IND-‐CPA
further,
we
take
an
approach
similar
to
PRF
• Require
that
an
OPE
is
indis8nguishable
from
an
“ideal”
object,
namely
a
random
order-‐preserving
func8on
(ROPF).
43. POPF-SECURITY
We
call
an
OPE
scheme
PseudorandomOPF-‐secure
if
no
efficient
adversary
can
output
1
with
no8ceably
different
probabili8es
between
the
two
experiments.
45. TOWARD A CONSTRUCTION
• It
is
not
immediately
clear
how
the
regular
building
block,
a
blockcipher,
helps.
• Solu8on:
combinatorics
and
sta8s8cs!
46. OPFS AND COMBINATIONS
Ø Observa8on:
There
is
a
bijec8on
between
the
set
of
OPFs
from
[M]
to
[N]
and
the
set
of
M-‐out-‐of-‐N
combina8ons.
Ø Example:
47. THE NHGD CONNECTION
Ø This
value
follows
the
nega8ve
hypergeometric
distribu8on
(NHGD)
on
parameters:
range
[N],
domain
[M],
index
i.
Ø Assume
we
have
an
efficient
way
to
sample
NHGD.
Pr [NHGD([N], [M], i) = c] =
c 1
i 1
N c
M i
N
M
Lazy-‐sampling
a
POPF
on
a
message
i
(domain
[M],
range
[N])
≅
Lazy-‐sampling
the
ith
largest
element
of
a
(pseudo)random
M-‐element
subset
of
[N].
48. SINGLE-POINT LAZY SAMPLING
Example
of
lazy-‐sampling
a
single
point:
2
3
4
1
5
6
7
8
9
1
2
3
4
5
plaintexts
ciphertexts
?
Domain
[5],
range
[9].
To
encrypt
only
i
=
3:
sample
NHGD([9],[5],3).
Suppose
the
outcome
is
6.
This
occurs
with
probability
{?,?,6,?,?}
(incomplete)
OPF
and
specifies
the
(incomplete)
5-‐element
subset
?
Pr [NHGD([9], [5], 3) = 6]
=
5
2
3
2
9
5
⇡ 0.24
49. MULTI-POINT LAZY-SAMPLING
Ø For
the
func8on
to
be
determinis8c
and
order-‐preserving,
lazy-‐sampling
must
take
into
account
“exis8ng”
points
when
selec8ng
new
points.
Ø An
inefficient
method
would
be
to
remember
every
exis8ng
point
and
adjust
further
sampling
parameters
accordingly.
Ø But
to
make
our
eventual
scheme
stateless,
we
will
instead
take
a
binary
search
approach.
Ø For
now,
assume
a
state
consis8ng
of
pre-‐determined
random
coins
(bitstrings)
r1,r2,…,rM
and
consider
this
as
the
key
to
our
scheme
51. REMARKS ON LAZY-SAMPLING
No8ce
that
Ø Given
random
fixed
coins
for
NHGD,
we
will
lazily
construct
a
(pseudo)random
OPF
Ø Each
encryp8on
uses
at
most
log2(M)
calls
to
the
NHGD
sampler
Ø Efficiency:
log2(M)
·∙
tNHGD
Ø The
state
consists
only
of
the
coins
r1,r2,…,rM
52. REMOVING THE STATE
Instead
of
storing
the
random
coins,
we
use
a
pseudorandom
func8on
(PRF)
that
takes
as
input
the
parameters
to
NHGD.
The
secret
key
to
our
scheme
is
just
the
key
K
to
the
blockcipher
NHGD(D1,R1,x1;
)
PRFK(D1,R1,x1)
r1
r1
NHGD(D2,R2,x2;
)
PRFK(D2,R2,x2)
r2
r2
NHGD(D3,R3,x3;
)
r3
PRFK(D3,R3,x3)
r3
53. MOVE TO HYPERGEOMETRIC
Ø There
does
not
seem
to
be
an
efficient
NHGD
algorithm.
!
Ø Instead
we
use
a
related
distribu8on:
Hypergeometric
Distribu8on
(HGD),
which
can
be
sampled
efficiently
[KS85].
Ø It
describes
how
many
members
of
a
random
M-‐set
are
less
than
value
y,
for
1
≤
y
≤
N
Ø HGD
can
be
used
if
we
slightly
modify
the
algorithms.
Ø This
gives
rise
to
a
POPF-‐secure
OPE.
☺
Ø Efficiency
is
the
same,
log
M
·∙
tHGD
,
on
average.
54. RECAP OF OPE
Ø Appropriate
defini8on
of
security:
POPF
Ø Our
later
study
[BCO11]
helped
to
clarify
security
leakage
of
POPF.
Ø POPF-‐secure
OPE
construc8on
via
lazy-‐sampling
on
the
HGD
distribu8on.
56. GREAT COLLABORATION AT A GLANCE
Skyhigh and myself had numerous fruitful discussions.
I was incredibly pleased with their approach and questions.
Ø They valued and wanted to understand provable security, and
wanted to employ provably secure schemes.
Ø They asked great questions and listened.
Ø They think open source is a must.
Ø They read academic papers and attended academic conferences.
Ø They hired the Advisory board of crypto experts.
Ø They managed to make us think.
Ø They managed to spark new research projects.
57. CHALLENGES WITH DEPLOYING OPE
Speed
of
algorithm
• HGD
sampling
means
un-‐op8mized
implementa8on
is
very
slow.
• Op8miza8on
required
extensive
use
of
low-‐level
floa8ng
point
libraries
to
speed
up
HGD
sampling
Ciphertext
length
• Padding
is
required
to
preserve
lexicographic
orders.
Padding
plaintexts
also
means
the
ciphertexts
are
long.
Need
to
fix
input
and
output
lengths
in
advance
• No
known
secure
way
to
use
OPE
like
a
block
cipher
• Makes
using
OPE
for
different
types
of
data
(longer/shorter)
difficult
What
order
is
preserved?
• Lexicographic?
Numeric?
Alphabe8c?
ASCII-‐be8c?
• Different
orderings
require
different
func8ons
to
encode
input
as
integers
before
encryp8on
• Needs
to
be
the
same
order
as
cloud
applica8on,
but
different
apps
could
have
different
orderings
58. MORE CHALLENGES
Ø Tradeoff of security for functionality
Ø Everybody wants to search everything, all the time.
Ø Can’t have great security at the same time.
Ø What security level is appropriate?
Ø How to explain to customers the security they are getting?
Ø May be easier for exact-match queries. However,
Ø When is frequency analysis an appropriate risk?
Ø For what data?
Ø Non-trivial for OPE
New
research
project
White
paper
59. MORE CHALLENGES
Ø Granularity of exact-match search
Ø Encrypt every word? Every line? Every paragraph? What is
appropriate tradeoff of usability for security?
Ø If OPE can be stateful, can we improve efficiency?
New
research
project