Presentation cedem luca

468
-1

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
468
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation cedem luca

  1. 1. Open Access and Database Anonymization an Open Source Procedure Based on an Italian Case Study Danube University Krems, 21-23 May 2014 L. Leschiutta, G.Futia
  2. 2. dd th Month Year What 222nd May 2014 Giuseppe Futia – Politecnico di Torino 2 Introduction (1)  The principal way to openly share a database is to remove all data that could lead to the identification of the involved subjects (i.e. database anonymization);  we describe a procedure on how to process and anonymize a collection of data that includes personal, sensitive and judicial data;  the procedure is general purpose and implemented relying solely on common open-source software applications.
  3. 3. dd th Month Year What 322nd May 2014 Giuseppe Futia – Politecnico di Torino 3 Introduction (2) • Our study is based on a real case in which a database consisting of 352 data fields of car accidents related data (TWIST) needs to be open accessed; • this work was developed in the framework of the Open-DAI project. Open-DAI is “Opening Data Architectures and Infrastructures” for European Public Administrations. It is a project funded under the ICT Policy Support Programme as part of the Competitiveness and Innovation framework Programme (CIP) Call 2011.
  4. 4. dd th Month Year What 422nd May 2014 Giuseppe Futia – Politecnico di Torino 4 Non Anonymous Data ID1 NID1 ID2 ID3 NID2 ID4 NID3 NID4 Item 1 Item 2 Item N
  5. 5. dd th Month Year What 522nd May 2014 Giuseppe Futia – Politecnico di Torino 5 Ordered Non Anonymous Data ID1 ID2 ID3 ID4 NID1 NID2 NID3 NID4 Item 1 Item 2 Item N
  6. 6. dd th Month Year What 622nd May 2014 Giuseppe Futia – Politecnico di Torino 6 Ordered Non Anonymous Data including Anonymous IDs ID1 ID2 ID3 ID4 AID NID1 NID2 NID3 NID4 Item 1 1053 Item 2 1001 1057 Item N 1133
  7. 7. dd th Month Year What 722nd May 2014 Giuseppe Futia – Politecnico di Torino 7 Anonymous Data AID NID1 NID2 NID3 NID4 1053 1001 1057 1133
  8. 8. dd th Month Year What 822nd May 2014 Giuseppe Futia – Politecnico di Torino 8 Random AIDs generation
  9. 9. dd th Month Year What 922nd May 2014 Giuseppe Futia – Politecnico di Torino 9 Advanced techniques: repeating IDs IF(ISNA(VLOOKUP(C4;C$1:C3;1; ));AID.A8;VLOOKUP(C4;C$1:F3;4; ))
  10. 10. dd th Month Year What 1022nd May 2014 Giuseppe Futia – Politecnico di Torino 10 Non Unique IDs In Multiple Cells (1) ID1 ID2 ID3 ID4 NID1 NID2 NID3 NID4 Item 1 Lorem ipsum Item 2 Lorem ipsum Item N Lorem ipsum
  11. 11. dd th Month Year What 1122nd May 2014 Giuseppe Futia – Politecnico di Torino 11 Non Unique IDs In Multiple Cells (2) flag=false; for (i=0; i<n: i++){ for (j=0; j<m: j++){ if(ID_Matrix[i][j]==ID_Matrix[n][m]{ AID_Matrix[n][m] = AID_Matrix[i][j]; flag=true; break; } } } if (flag==false){ AID_Matrix[n][m] = Next_Availabe_AID(k); k++; }
  12. 12. dd th Month Year What 1222nd May 2014 Giuseppe Futia – Politecnico di Torino 12 Data Wiping • To perform this operation on Windows, you can use the open source program Eraser (http://eraser.heidi.ie ); • on Linux, you can use the following commands: > shred NonAnonymousData.csv > rm NonAnonymousData.csv
  13. 13. dd th Month Year What 1322nd May 2014 Giuseppe Futia – Politecnico di Torino 13 Cryptograph the file • On Windows this can be achieved by using the open source 7zip program (http://www.7- zip.org/ ) that allows to achieve a strong AES- 256 encryption. • On Linux you can use the following command: > gpg -c NonAnonymousData.csv The encrypted file must then be backed up to a safe location e.g. a non-rewritable DVD or a WORM (Write Once Read Many) tape.
  14. 14. dd th Month Year What 1422nd May 2014 Giuseppe Futia – Politecnico di Torino 14 Data Degradation (location)
  15. 15. dd th Month Year What 1522nd May 2014 Giuseppe Futia – Politecnico di Torino 15 Data Degradation (location)
  16. 16. dd th Month Year What 1622nd May 2014 Giuseppe Futia – Politecnico di Torino 16 Data Degradation (time) • 10 November 2011 at 10:25 • 10 November 2011 between 10 and 11 • Winter 2011
  17. 17. dd th Month Year What 1722nd May 2014 Giuseppe Futia – Politecnico di Torino 17 Conclusions: de-anonymization test • How to test if a database is anonymous enough? • Reasonable efforts “the means possibly required to effect identification are to be considered disproportionate compared with the (risk of) damage resulting” • de-anonymization test
  18. 18. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 18 Thank you Luca Leschiutta (luca.leschiutta@polito.it) Giuseppe Futia (giuseppe.futia@polito.it) Nexa Center for Internet & Society (http://nexa.polito.it) Dept. of Computer and Control Engineering (DAUIN) Politecnico di Torino, Italy
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×