Towards a Theory of Data Entanglement James Aspnes, Joan Feigenbaum, Aleksandr Yampolskiy, and Sheng Zhong (Yale University)
Outline Motivation Dagster and Tangler Our model Notions of entanglement Possibility and impossibility results Conclusion
Goal: Protect Remotely Stored Data from the Server Question:  Suppose you store your data on a remote server. How do you ensure that it is not corrupted by the server? Answer:  Have your data  entangled  with some VIPs’ data so that  corruption of your data    corruption of theirs.
Previous Work: Dagster [SW01] New Document  Encrypt c   randomly chosen blocks Pool of blocks Analysis: Deleting a typical document    loss of O( c ) documents
Previous Work: Tangler [WM01] (0, New Document) 2  randomly chosen blocks Pool of   n   blocks Analysis: Deleting a typical document    loss of O ( (log  n ) /  n )  documents Interpolate degree-2 poly F() (x 1 ,F(x 1 )) (x 2 ,F(x 2 ))
Our Model: Basic Framework Initialization : Keys are distributed to participants. Entanglement : Users’  data are combined into a common store. Tampering:  Adversary tampers with the store before it is stored on server. encoding  E … d 1 d 2 d n initializer  I k 1 k 2 k n k E tamperer storage server
Our Model: Basic Framework (cont.) Recovery : Users attempt to recover their data. If R i  returns original document d i , we say that user i  recovers  her data. … k 1 k 2 k n storage server
Our Model : Classification Question:  What can the adversary do to the data store? Answer:  He can… tamper with the store tamper with the store and distribute a new  recovery algorithm to  all  users ( upgrade attack ) encrypt the store and distribute his recovery algorithm  only to a few  select buddies ( superencryption attack )
Our Model : Classification (cont.) Classification based on recovery algorithm:  Standard recovery algorithm   Public recovery algorithm   Private recovery algorithm … … …
Our Model : Classification (cont.) Classification based on corrupting algorithm: Destructive adversary  that reduces entropy of the data store. Arbitrary adversary.   Altogether, we have 6 (= 3 £  2)  adversary classes .
Our Definitions Fix encoding scheme  , adversary    , and  recovery algorithms R i . Recovery vector   summarizes which documents are recovered
Our Definitions (cont.)  Data dependency:  d i  depends on d j  if, with high probability,  d i  is recovered    d j  is recovered: d 1 d 2 d 3 d 4 d 1  depends on d 2
Our Definitions (cont.) All-or-nothing integrity (AONI):  every document depends on every other document: d 1 d 2 d 3 d 4
Our Definitions (cont.) Symmetric recovery:  adversary cannot bias which documents are recovered
Possibility of AONI in Standard-Recovery Model All users use the standard recovery algorithm: for all i, R i =R. When combining data, mark data store using an unforgeable Message Authentication Code (MAC). Standard recovery algorithm checks MAC: If MAC is valid, recover data. If MAC is invalid, refuse to recover data.
Impossibility of AONI in Public and Private-Recovery Models If any users use the adversary’s recovery algorithm (for some i, R i  ≠ R), AONI cannot be achieved Adversary modifies the data store so that old recovery algorithm does not work. And distributes a new recovery algorithm that flips a coin to decide whether to recover data  or not.
Impossibility of AONI in Public and Private-Recovery Models (cont.) With high probability, not all coin flips will have same result. With high probability, some data are recovered while others are not. …
Possibility of Symmetric Recovery in Public-Recovery Model All users use adversary’s recovery algorithm: for all i,  We can prevent targeted destruction of documents. Documents d 1 ,…, d n  must appear i.i.d Encoding scheme must be symmetric:
Possibility of AONI for Destructive Adversaries We can achieve AONI in all recovery models if tamperer destroys entropy. When combining data, interpolate a polynomial using points (k i , d i ). Store = polynomial. AONI is achieved if sufficient entropy is removed. Many stores are mapped to single corrupted store.    With high probability, cannot recover every data item.
Summary of Results  all-or-nothing Private Recovery symmetric recovery all-or-nothing Public  Recovery all-or-nothing all-or-nothing Standard Recovery Arbitrary Tamperer Destructive Tamperer
Future Work We have considered a single-round model. Allowing multiple rounds of storage/retrieval will be more realistic. What if data entanglement is combined with other techniques like replication? Will that help to defend data against untrusted server(s)?

Towards a theory of data entangelement

  • 1.
    Towards a Theoryof Data Entanglement James Aspnes, Joan Feigenbaum, Aleksandr Yampolskiy, and Sheng Zhong (Yale University)
  • 2.
    Outline Motivation Dagsterand Tangler Our model Notions of entanglement Possibility and impossibility results Conclusion
  • 3.
    Goal: Protect RemotelyStored Data from the Server Question: Suppose you store your data on a remote server. How do you ensure that it is not corrupted by the server? Answer: Have your data entangled with some VIPs’ data so that corruption of your data  corruption of theirs.
  • 4.
    Previous Work: Dagster[SW01] New Document  Encrypt c randomly chosen blocks Pool of blocks Analysis: Deleting a typical document  loss of O( c ) documents
  • 5.
    Previous Work: Tangler[WM01] (0, New Document) 2 randomly chosen blocks Pool of n blocks Analysis: Deleting a typical document  loss of O ( (log n ) / n ) documents Interpolate degree-2 poly F() (x 1 ,F(x 1 )) (x 2 ,F(x 2 ))
  • 6.
    Our Model: BasicFramework Initialization : Keys are distributed to participants. Entanglement : Users’ data are combined into a common store. Tampering: Adversary tampers with the store before it is stored on server. encoding E … d 1 d 2 d n initializer I k 1 k 2 k n k E tamperer storage server
  • 7.
    Our Model: BasicFramework (cont.) Recovery : Users attempt to recover their data. If R i returns original document d i , we say that user i recovers her data. … k 1 k 2 k n storage server
  • 8.
    Our Model :Classification Question: What can the adversary do to the data store? Answer: He can… tamper with the store tamper with the store and distribute a new recovery algorithm to all users ( upgrade attack ) encrypt the store and distribute his recovery algorithm only to a few select buddies ( superencryption attack )
  • 9.
    Our Model :Classification (cont.) Classification based on recovery algorithm: Standard recovery algorithm Public recovery algorithm Private recovery algorithm … … …
  • 10.
    Our Model :Classification (cont.) Classification based on corrupting algorithm: Destructive adversary that reduces entropy of the data store. Arbitrary adversary. Altogether, we have 6 (= 3 £ 2) adversary classes .
  • 11.
    Our Definitions Fixencoding scheme , adversary , and recovery algorithms R i . Recovery vector summarizes which documents are recovered
  • 12.
    Our Definitions (cont.) Data dependency: d i depends on d j if, with high probability, d i is recovered  d j is recovered: d 1 d 2 d 3 d 4 d 1 depends on d 2
  • 13.
    Our Definitions (cont.)All-or-nothing integrity (AONI): every document depends on every other document: d 1 d 2 d 3 d 4
  • 14.
    Our Definitions (cont.)Symmetric recovery: adversary cannot bias which documents are recovered
  • 15.
    Possibility of AONIin Standard-Recovery Model All users use the standard recovery algorithm: for all i, R i =R. When combining data, mark data store using an unforgeable Message Authentication Code (MAC). Standard recovery algorithm checks MAC: If MAC is valid, recover data. If MAC is invalid, refuse to recover data.
  • 16.
    Impossibility of AONIin Public and Private-Recovery Models If any users use the adversary’s recovery algorithm (for some i, R i ≠ R), AONI cannot be achieved Adversary modifies the data store so that old recovery algorithm does not work. And distributes a new recovery algorithm that flips a coin to decide whether to recover data or not.
  • 17.
    Impossibility of AONIin Public and Private-Recovery Models (cont.) With high probability, not all coin flips will have same result. With high probability, some data are recovered while others are not. …
  • 18.
    Possibility of SymmetricRecovery in Public-Recovery Model All users use adversary’s recovery algorithm: for all i, We can prevent targeted destruction of documents. Documents d 1 ,…, d n must appear i.i.d Encoding scheme must be symmetric:
  • 19.
    Possibility of AONIfor Destructive Adversaries We can achieve AONI in all recovery models if tamperer destroys entropy. When combining data, interpolate a polynomial using points (k i , d i ). Store = polynomial. AONI is achieved if sufficient entropy is removed. Many stores are mapped to single corrupted store.  With high probability, cannot recover every data item.
  • 20.
    Summary of Results all-or-nothing Private Recovery symmetric recovery all-or-nothing Public Recovery all-or-nothing all-or-nothing Standard Recovery Arbitrary Tamperer Destructive Tamperer
  • 21.
    Future Work Wehave considered a single-round model. Allowing multiple rounds of storage/retrieval will be more realistic. What if data entanglement is combined with other techniques like replication? Will that help to defend data against untrusted server(s)?

Editor's Notes

  • #8 How does the tamperer work and who gives out recovery algorithms?
  • #13 Replace picture and possibly drop the formal definition
  • #17 Add a picture of happy/unhappy computers
  • #18 Add a picture of happy/unhappy computers
  • #20 Emphasize that adversary cannot guess what the k_i are. Many-to-one map can’t be correct for most k_i. b/c of the property of polynomials. Maybe add picture (?)