My name is Francesco Pagano and I come from University of Milan – Italy. Today, I present the paper “Handling Confidential Data on the Untrusted Cloud: an agent-based approach”, written with Prof. Ernesto Damiani from the same university.
The agenda of my speech: First, I'll analyze the issue of privacy in cloud computing, showing some classical solutions from literature. Then, I'll show an intrinsic problem that clears the effort of those solutions. Followed by a detailed presentation of our solution. Finally, it will be “question time”.
In cloud computing there is a clear distinction between the Platform, hosted in the cloud, and the clients, distributed in Internet. The clients access the outsourced data, stored in the Platform via applications, in the Cloud too. External user identification and access control are very well studied and diffused, so that EXTERNAL malicious users are easily stopped. But what about internal access? We don't want that Cloud Providers have access to our sensitive data!
The previous techniques ensure outsourced data integrity, but this is not enough since the data has a long way to go after the data layer. In a Java application, for example, it passes through JDBC, Hibernate, and so on, up to presentation layer. And at that level, certainly, data is clear text. An attacker can attack one of the weakest levels, for example, using aspect programming. <Click> So, for privacy, we have to move “presentation layer” to client side <click> but now data and presentation are divided. If we want performance <click> we have to move also data to client side <click>.
And this is our proposal. We suggest to atomize the couple application/database, providing a copy per user. Every instance runs locally, and maintains only authorized data that is replicated and synchronized among all authorized users. A centralized node hosts an untrusted Synchronizer which never holds plain-text data.
Each user has a local copy of his data. We use the term dossier to indicate a group of correlated informations such as a medical record or a court file. If 2 users can access the same dossier, each of them has a copy of the dossier. We suppose that only one user (called “the owner”) can modify the dossier.
The local nodes synchronize that data by a central repository that stores the updated records. To prevent this synchronizer to access the data, it is encrypted. The decryption keys, protected in the way that we will see later, are also stored into the synchronizer.
This is all. If you have any question...
Using In-Memory Encrypted Databases on the Cloud Francesco (and Davide) Pagano [email_address] Department of Information Technology Università degli Studi di Milano - Italy
Agenda <ul><li>Privacy issue on the cloud </li></ul><ul><li>An agent based approach </li></ul><ul><li>Database encryption </li></ul><ul><li>In Memory Databases and HyperSql </li></ul><ul><li>Our solution </li></ul><ul><li>Benchmarking </li></ul><ul><li>Conclusion </li></ul><ul><li>Question time </li></ul>
Access control problem Cloud Platform Desktop Desktop controlled accesses for external users uncensored access for cloud provider
Privacy within the cloud: on the same side of the wall Presentation Layer privacy Data Layer performance
An agent-based approach Untrusted Synchronizer never holds plaintext data Local agent with local db
Database encryption * L. Bouganim and Y. Guo, “Database encryption,” in Encyclopedia of Cryptography and Security, Springer, 2010, 2nd Edition
Granularity in database-level encryption <ul><li>database </li></ul><ul><li>tables </li></ul><ul><li>columns </li></ul><ul><li>rows </li></ul>
In Memory Databases <ul><li>“ An in-memory database (IMDB also known as main memory database system or MMDB and as real-time database or RTDB) is a database management system that primarily relies on main memory for computer data storage.” </li></ul><ul><li>* Wikipedia </li></ul>
.script file of a sample database <ul><li>CREATE SCHEMA PUBLIC AUTHORIZATION DBA </li></ul><ul><li>CREATE MEMORY TABLE DOSSIER(ID INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH 0) NOT NULL PRIMARY KEY,NAME CHAR(80)) </li></ul><ul><li>CREATE MEMORY TABLE STUDENTS(ID INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH 0) NOT NULL PRIMARY KEY,NAME CHAR(80)) </li></ul><ul><li>ALTER TABLE DOSSIER ALTER COLUMN ID RESTART WITH 0 </li></ul><ul><li>ALTER TABLE STUDENTS ALTER COLUMN ID RESTART WITH 32 </li></ul><ul><li>CREATE USER SA PASSWORD "" </li></ul><ul><li>GRANT DBA TO SA </li></ul><ul><li>SET WRITE_DELAY 10 </li></ul><ul><li>SET SCHEMA PUBLIC </li></ul><ul><li>INSERT INTO STUDENTS VALUES(12,'Alice') </li></ul><ul><li>INSERT INTO STUDENTS VALUES(31,'Bob') </li></ul>
Implemented solution: client side <ul><li>On the client side, using IMDBs, we have only two interactions between each local agent and the Synchronizer </li></ul>
The modified .script file <ul><li>INSERT INTO students(id,name) VALUES(12,'Alice'); </li></ul><ul><li>INSERT INTO students(id,name) VALUES(31,'Bob'); </li></ul><ul><li>$27@5F3C25EE5738DAAAED5DA06A80F305A93C95A </li></ul><ul><li>$45@5DA67ADA06AAED580FA914BF3C953057D387F </li></ul><ul><li>INSERT INTO students(id,name) VALUES(23,'Carol'); </li></ul>Encrypted rows id_pending_row
Performances <ul><li>In contrast to the usual row-level encryption, which needs encryption/decryption at every data access, our solution uses these heavy operations only when communicating with Synchronizer, with a clear advantage, especially in the case of rarely modified databases. </li></ul>
Performances: read operations <ul><li>The system uses decryption only at start time, when records are loaded from the disk into the main memory. Each row is decrypted none (if it is owned by local node) or just once (if it is owned by a remote node), so this is optimal for read operations. Each decryption implies an access to the remote Synchronizer to download the related decrypting key and, eventually, the modified row. </li></ul>
Performances: write operations <ul><li>Write operations occur when a record is inserted / updated into the db, with no overload until the client, when online, explicitly synchronizes data with the central server. At this moment, for each modified record, the client needs to: </li></ul><ul><li>• generate a new (symmetric) key </li></ul><ul><li>• encrypt the record </li></ul><ul><li>• dispatch the encrypted data and the decrypting key to the remote synchronizer </li></ul>
Benchmark (1) <ul><li>Creation of database and sample tables </li></ul><ul><li>Population of tables with sample values </li></ul><ul><li>Sharing of a portion of data with another user </li></ul><ul><li>Receipt of shared dossiers from other users </li></ul><ul><li>Opening of the newly created (and populated) database </li></ul>
Benchmark (2) <ul><li>To minimize communication delay, the central Synchronizer and the clients ran on the same computer. For testing purpose, it was sufficient to use only two clients (to enable data sharing). The application was compared with an equivalent one having the following differences: </li></ul><ul><li>• It uses the unmodified HyperSQL driver </li></ul><ul><li>• It doesn’t share data with other clients </li></ul><ul><li>• When populating the database, it creates the same number of dossiers than the previous application; after benchmarking, however, it adds the number of shared dossiers, resulting in the same final number of dossiers. </li></ul>
Benchmark (3) <ul><li>We benchmarked the system using single-table dossiers of about 200 bytes, in two batteries of tests; the first with 20%, and the second with 40% of shared dossiers, which numbered from 1,000 to 500,000. </li></ul>
Results (1) <ul><li>Overhead when 20% of dossiers are shared </li></ul>
Results (2) <ul><li>Overhead when 40% of dossiers are shared </li></ul>
Conclusion <ul><li>In this paper, using IMDBs, we presented a simple solution to row-level encryption of databases. It can be used in the cloud to manage very granular access rights in a highly distributed database. This allows for stronger confidence in the privacy of shared sensitive data. An interesting field of application is the use in (business) cooperative environments, e.g. professional networks. In these environments, privacy is a priority, but low computing resources don't allow the use of slow and complex algorithms. IMDBs and our smart encryption, instead, achieve the goal in a more effective way. </li></ul>