Engineering data privacy - The ARX data anonymization tool

•

2 likes•3,334 views

Website with further information: http://arx.deidentifier.org Description of this talk: While a plethora of methods have been proposed for dealing with many aspects of de-identifying clinical data, only few (prototypical) implementations are available. Actually, the complexity of implementing privacy technologies is an often overlooked challenge. In this talk we will present the open source data de-identification tool ARX, which has been carefully engineered to support multiple privacy technologies for relational datasets. Our tool bridges the gap between different scientific disciplines by integrating methods developed and used by the statistics community with data anonymization techniques developed by computer scientists. ARX has been designed from the ground up to ensure scalability and it is able to process very large datasets on commodity hardware. The software implements a large set of privacy models: (1) syntactic privacy models, such as k-anonymity, l-diversity, t-closeness and δ-presence, (2) statistical models for re-identification risks, and (3) differential privacy. In the talk, we will focus on measures to reduce the uniqueness of records. ARX also supports more than ten different methods for evaluating data utility, including loss, precision, non-uniform entropy and KL divergence. In ARX, de-identification of data can be performed automatically, semi-automatically and manually using a complex method that integrates global recoding, local recoding, categorization, generalization, suppression, microaggregation and top/bottom-coding. All methods are accessible via a comprehensive cross-platform graphical user interface.

Software

Technische Universität München
Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn
Chair for Biomedical Informatics
Institute for Medical Statistics and Epidemiologie
University of Technology Munich (TUM)
Engineering data privacy -
The ARX data anonymization tool

Technische Universität München
What is ARX?
●
= +
●
A tool for analyzing and reducing the uniqueness of records
in a (relational) dataset
●
Variety of methods
●
Highly scalable
●
Up to 50 dimensions (i.e. attributes)
●
Millions of records
●
(Semi-)automatically and/or manually
●
Comprehensive graphical user interface
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 2
Images: https://commons.wikimedia.org/ users: Ysangkok, Scarce2
statistics computer science
Methods from

Technische Universität München
Example
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 3
Generalization
Suppression
Microaggregation
Reduce uniqueness

Technische Universität München
Overview of methods implemented by ARX
Sample-based methods
• Fraction of sample uniques
• Average sample uniqueness
• k-anonymity
Population-based methods
• Model by Zayatz [1]
• Model by Hoshino [2]
• Model by Chen et al. [3] / Rinott [4]
• Model by Dankar et al. [5]
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 4
[1] Zayatz, L.V.: Estimation of the percent of unique population
elements on a microdata file using the sample. Statistical
Research Division Report Number: Census/SRD/RR-91/08 (1991)
[2] Hoshino, N.: Applying pitmans sampling formula to microdata
disclosure risk assessment. J Off Stat 17(4), 499520 (2001)
[3] Chen, G., Keller-McNulty, S.: Estimation of identification disclosure
risk in microdata. J Off Stat 14, 7995 (1998)
[4] Rinott, Y.: On models for statistical disclosure risk estimation. In:
Proc ECE/Eurostat Work Session Stat Data Confid, p. 275285 (2003)
[5] Dankar, F., Emam, K.E., Neisa, A., Roffey, T.:
Estimating the re-identification risk of clinical
data sets. BMC Med Inform Decis Mak 12(1), 66 (2012)
Global and local recoding
• Can be weighted
Methods
• Categorization
• Generalization
• Cell suppression
• Record suppression
• Micro-aggregation
• Top/bottom coding
Weighted and parameterized
• Ability to control the application
of different coding models
Methods
• AECS, Discernibility, Precision
• (Normalized) Mean squared error
• (Normalized) Non-uniform entropy
• KL divergence
• Loss
Measures for utility Coding models Measures for uniqueness
Transform
Visualize
Analyze
Adapt

Technische Universität München
Screenshots
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 5

Technische Universität München
Screenshots (cont'd)
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 6

Technische Universität München
Further features offered by ARX
●
Syntactic privacy models
●
ℓ-diversity, t-closeness, δ-disclosure privacy, δ-presence
●
Risk-based anonymization
●
Differential privacy
●
Truthful (e,δ)-differentially private data release
●
Using random sampling
●
Detection of HIPAA identifiers
●
Based on heuristics
●
Import from multiple sources
●
RDBMS, Excel, CSV
●
Software library
●
Open source, cross-platform
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 7

Technische Universität München
http://arx.deidentifier.org

What's hot

Cissp d5-cryptography v2012-mini coursev2infosecedu

Information Security & CryptographyArun ACE

Biometric technology .pptxvineeth chepuri

Advanced cryptography and implementationAkash Jadhav

Privacy preserving dm_pptSagar Verma

Chapter3Honeyennyl

A study on biometric authentication techniquesSubhash Basistha

Cryptography.pptUday Meena

Hunting for cyber threats targeting weapon systemsFidelis Cybersecurity

cryptographyAbhijeet Singh

Методи за криптиране и декриптиране на данниpinf_117075

CryptographyKARNAN L S

Методи за криптиране и декриптиране на данниpinf_117075

Chapter 2 PresentationAmy McMullin

Digital SignaturesEhtisham Ali

Cryptography IntroChristopher Martin

Cryptographysubodh pawar

Visual CryptographyEcaterina Moraru (Valica)

Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare NelsonSSIMeetup

Chapter 6 PresentationAmy McMullin

What's hot (20)

Cissp d5-cryptography v2012-mini coursev2

Information Security & Cryptography

Biometric technology .pptx

Advanced cryptography and implementation

Privacy preserving dm_ppt

Chapter3

A study on biometric authentication techniques

Cryptography.ppt

Hunting for cyber threats targeting weapon systems

cryptography

Методи за криптиране и декриптиране на данни

Cryptography

Методи за криптиране и декриптиране на данни

Chapter 2 Presentation

Digital Signatures

Cryptography Intro

Cryptography

Visual Cryptography

Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare Nelson

Chapter 6 Presentation

Similar to Engineering data privacy - The ARX data anonymization tool

Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG

An overview of methods for data anonymizationarx-deidentifier

June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal

An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG

A review on early hospital mortality prediction using vital signalsReza Sadeghi

A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes

Hanaa phd presentation 14-4-2017Aboul Ella Hassanien

Development of Computational Tool for Lung Cancer Prediction Using Data MiningEditor IJCATR

Pattern recognition using context dependent memory model (cdmm) in multimodal...ijfcstjournal

A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...arx-deidentifier

algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier

FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...JaresJournal

Prof. Mark Coles (Oxford University) - Data-driven systems medicinemntbs1

TBerger_FinalReportThaddeus Berger

SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal

Feature selection and microarray dataGianluca Bontempi

An Overview on Gene Expression AnalysisIOSR Journals

Charleston Conference 2016Anita de Waard

Intelligent generator of big data medicalNexgen Technology

Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...cscpconf

Similar to Engineering data privacy - The ARX data anonymization tool (20)

Challenges and opportunities for machine learning in biomedical research

An overview of methods for data anonymization

June 2020: Top Read Articles in Advanced Computational Intelligence

An introduction to machine learning in biomedical research: Key concepts, pr...

A review on early hospital mortality prediction using vital signals

A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...

Hanaa phd presentation 14-4-2017

Development of Computational Tool for Lung Cancer Prediction Using Data Mining

Pattern recognition using context dependent memory model (cdmm) in multimodal...

A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...

algorithmic-decisions, fairness, machine learning, provenance, transparency

FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...

Prof. Mark Coles (Oxford University) - Data-driven systems medicine

TBerger_FinalReport

SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS

Feature selection and microarray data

An Overview on Gene Expression Analysis

Charleston Conference 2016

Intelligent generator of big data medical

Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...

Recently uploaded

Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions

Powering Real-Time Decisions with Continuous Data StreamsSafe Software

The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp

VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics

Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools

Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j

UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz

Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1

Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki

Post Quantum Cryptography – The Impact on Identityteam-WIBU

Large Language Models for Test Case Evolution and RepairLionel Briand

2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin

Precise and Complete Requirements? An Elusive GoalLionel Briand

Zer0con 2024 final share short version.pdfmaor17

Osi security architecture in network.pptxVinzoCenzo

2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin

Strategies for using alternative queries to mitigate zero resultsJean Silva

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver

Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app

Recently uploaded (20)

Best Angular 17 Classroom & Online training - Naresh IT

Powering Real-Time Decisions with Continuous Data Streams

The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx

VictoriaMetrics Q1 Meet Up '24 - Community & News Update

Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton

Keeping your build tool updated in a multi repository world

GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx

Amazon Bedrock in Action - presentation of the Bedrock's capabilities

Machine Learning Software Engineering Patterns and Their Engineering

Post Quantum Cryptography – The Impact on Identity

Large Language Models for Test Case Evolution and Repair

2024 DevNexus Patterns for Resiliency: Shuffle shards

Precise and Complete Requirements? An Elusive Goal

Zer0con 2024 final share short version.pdf

Osi security architecture in network.pptx

2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf

Strategies for using alternative queries to mitigate zero results

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...

Effectively Troubleshoot 9 Types of OutOfMemoryError

Engineering data privacy - The ARX data anonymization tool

1. Technische Universität München Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn Chair for Biomedical Informatics Institute for Medical Statistics and Epidemiologie University of Technology Munich (TUM) Engineering data privacy - The ARX data anonymization tool

2. Technische Universität München What is ARX? ● = + ● A tool for analyzing and reducing the uniqueness of records in a (relational) dataset ● Variety of methods ● Highly scalable ● Up to 50 dimensions (i.e. attributes) ● Millions of records ● (Semi-)automatically and/or manually ● Comprehensive graphical user interface ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 2 Images: https://commons.wikimedia.org/ users: Ysangkok, Scarce2 statistics computer science Methods from

3. Technische Universität München Example ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 3 Generalization Suppression Microaggregation Reduce uniqueness

4. Technische Universität München Overview of methods implemented by ARX Sample-based methods • Fraction of sample uniques • Average sample uniqueness • k-anonymity Population-based methods • Model by Zayatz [1] • Model by Hoshino [2] • Model by Chen et al. [3] / Rinott [4] • Model by Dankar et al. [5] ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 4 [1] Zayatz, L.V.: Estimation of the percent of unique population elements on a microdata file using the sample. Statistical Research Division Report Number: Census/SRD/RR-91/08 (1991) [2] Hoshino, N.: Applying pitmans sampling formula to microdata disclosure risk assessment. J Off Stat 17(4), 499520 (2001) [3] Chen, G., Keller-McNulty, S.: Estimation of identification disclosure risk in microdata. J Off Stat 14, 7995 (1998) [4] Rinott, Y.: On models for statistical disclosure risk estimation. In: Proc ECE/Eurostat Work Session Stat Data Confid, p. 275285 (2003) [5] Dankar, F., Emam, K.E., Neisa, A., Roffey, T.: Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak 12(1), 66 (2012) Global and local recoding • Can be weighted Methods • Categorization • Generalization • Cell suppression • Record suppression • Micro-aggregation • Top/bottom coding Weighted and parameterized • Ability to control the application of different coding models Methods • AECS, Discernibility, Precision • (Normalized) Mean squared error • (Normalized) Non-uniform entropy • KL divergence • Loss Measures for utility Coding models Measures for uniqueness Transform Visualize Analyze Adapt

5. Technische Universität München Screenshots ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 5

6. Technische Universität München Screenshots (cont'd) ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 6

7. Technische Universität München Further features offered by ARX ● Syntactic privacy models ● ℓ-diversity, t-closeness, δ-disclosure privacy, δ-presence ● Risk-based anonymization ● Differential privacy ● Truthful (e,δ)-differentially private data release ● Using random sampling ● Detection of HIPAA identifiers ● Based on heuristics ● Import from multiple sources ● RDBMS, Excel, CSV ● Software library ● Open source, cross-platform ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 7

8. Technische Universität München http://arx.deidentifier.org

Engineering data privacy - The ARX data anonymization tool

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Engineering data privacy - The ARX data anonymization tool

Similar to Engineering data privacy - The ARX data anonymization tool (20)

More from arx-deidentifier

More from arx-deidentifier (6)

Recently uploaded

Recently uploaded (20)

Engineering data privacy - The ARX data anonymization tool