SlideShare a Scribd company logo
1 of 8
Download to read offline
Technische Universität München
Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn
Chair for Biomedical Informatics
Institute for Medical Statistics and Epidemiologie
University of Technology Munich (TUM)
Engineering data privacy -
The ARX data anonymization tool
Technische Universität München
What is ARX?
●
= +
●
A tool for analyzing and reducing the uniqueness of records
in a (relational) dataset
●
Variety of methods
●
Highly scalable
●
Up to 50 dimensions (i.e. attributes)
●
Millions of records
●
(Semi-)automatically and/or manually
●
Comprehensive graphical user interface
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 2
Images: https://commons.wikimedia.org/ users: Ysangkok, Scarce2
statistics computer science
Methods from
Technische Universität München
Example
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 3
Generalization
Suppression
Microaggregation
Reduce uniqueness
Technische Universität München
Overview of methods implemented by ARX
Sample-based methods
• Fraction of sample uniques
• Average sample uniqueness
• k-anonymity
Population-based methods
• Model by Zayatz [1]
• Model by Hoshino [2]
• Model by Chen et al. [3] / Rinott [4]
• Model by Dankar et al. [5]
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 4
[1] Zayatz, L.V.: Estimation of the percent of unique population
elements on a microdata file using the sample. Statistical
Research Division Report Number: Census/SRD/RR-91/08 (1991)
[2] Hoshino, N.: Applying pitmans sampling formula to microdata
disclosure risk assessment. J Off Stat 17(4), 499520 (2001)
[3] Chen, G., Keller-McNulty, S.: Estimation of identification disclosure
risk in microdata. J Off Stat 14, 7995 (1998)
[4] Rinott, Y.: On models for statistical disclosure risk estimation. In:
Proc ECE/Eurostat Work Session Stat Data Confid, p. 275285 (2003)
[5] Dankar, F., Emam, K.E., Neisa, A., Roffey, T.:
Estimating the re-identification risk of clinical
data sets. BMC Med Inform Decis Mak 12(1), 66 (2012)
Global and local recoding
• Can be weighted
Methods
• Categorization
• Generalization
• Cell suppression
• Record suppression
• Micro-aggregation
• Top/bottom coding
Weighted and parameterized
• Ability to control the application
of different coding models
Methods
• AECS, Discernibility, Precision
• (Normalized) Mean squared error
• (Normalized) Non-uniform entropy
• KL divergence
• Loss
Measures for utility Coding models Measures for uniqueness
Transform
Visualize
Analyze
Adapt
Technische Universität München
Screenshots
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 5
Technische Universität München
Screenshots (cont'd)
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 6
Technische Universität München
Further features offered by ARX
●
Syntactic privacy models
●
ℓ-diversity, t-closeness, δ-disclosure privacy, δ-presence
●
Risk-based anonymization
●
Differential privacy
●
Truthful (e,δ)-differentially private data release
●
Using random sampling
●
Detection of HIPAA identifiers
●
Based on heuristics
●
Import from multiple sources
●
RDBMS, Excel, CSV
●
Software library
●
Open source, cross-platform
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 7
Technische Universität München
http://arx.deidentifier.org

More Related Content

What's hot

Cissp d5-cryptography v2012-mini coursev2
Cissp d5-cryptography v2012-mini coursev2Cissp d5-cryptography v2012-mini coursev2
Cissp d5-cryptography v2012-mini coursev2infosecedu
 
Information Security & Cryptography
Information Security & CryptographyInformation Security & Cryptography
Information Security & CryptographyArun ACE
 
Biometric technology .pptx
Biometric technology .pptxBiometric technology .pptx
Biometric technology .pptxvineeth chepuri
 
Advanced cryptography and implementation
Advanced cryptography and implementationAdvanced cryptography and implementation
Advanced cryptography and implementationAkash Jadhav
 
Privacy preserving dm_ppt
Privacy preserving dm_pptPrivacy preserving dm_ppt
Privacy preserving dm_pptSagar Verma
 
A study on biometric authentication techniques
A study on biometric authentication techniquesA study on biometric authentication techniques
A study on biometric authentication techniquesSubhash Basistha
 
Cryptography.ppt
Cryptography.pptCryptography.ppt
Cryptography.pptUday Meena
 
Hunting for cyber threats targeting weapon systems
Hunting for cyber threats targeting weapon systemsHunting for cyber threats targeting weapon systems
Hunting for cyber threats targeting weapon systemsFidelis Cybersecurity
 
Методи за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данниМетоди за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данниpinf_117075
 
Методи за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данниМетоди за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данниpinf_117075
 
Chapter 2 Presentation
Chapter 2 PresentationChapter 2 Presentation
Chapter 2 PresentationAmy McMullin
 
Digital Signatures
Digital SignaturesDigital Signatures
Digital SignaturesEhtisham Ali
 
Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare Nelson
Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare NelsonZero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare Nelson
Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare NelsonSSIMeetup
 
Chapter 6 Presentation
Chapter 6 PresentationChapter 6 Presentation
Chapter 6 PresentationAmy McMullin
 

What's hot (20)

Cissp d5-cryptography v2012-mini coursev2
Cissp d5-cryptography v2012-mini coursev2Cissp d5-cryptography v2012-mini coursev2
Cissp d5-cryptography v2012-mini coursev2
 
Information Security & Cryptography
Information Security & CryptographyInformation Security & Cryptography
Information Security & Cryptography
 
Biometric technology .pptx
Biometric technology .pptxBiometric technology .pptx
Biometric technology .pptx
 
Advanced cryptography and implementation
Advanced cryptography and implementationAdvanced cryptography and implementation
Advanced cryptography and implementation
 
Privacy preserving dm_ppt
Privacy preserving dm_pptPrivacy preserving dm_ppt
Privacy preserving dm_ppt
 
Chapter3
Chapter3Chapter3
Chapter3
 
A study on biometric authentication techniques
A study on biometric authentication techniquesA study on biometric authentication techniques
A study on biometric authentication techniques
 
Cryptography.ppt
Cryptography.pptCryptography.ppt
Cryptography.ppt
 
Hunting for cyber threats targeting weapon systems
Hunting for cyber threats targeting weapon systemsHunting for cyber threats targeting weapon systems
Hunting for cyber threats targeting weapon systems
 
cryptography
cryptographycryptography
cryptography
 
Методи за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данниМетоди за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данни
 
Cryptography
CryptographyCryptography
Cryptography
 
Методи за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данниМетоди за криптиране и декриптиране на данни
Методи за криптиране и декриптиране на данни
 
Chapter 2 Presentation
Chapter 2 PresentationChapter 2 Presentation
Chapter 2 Presentation
 
Digital Signatures
Digital SignaturesDigital Signatures
Digital Signatures
 
Cryptography Intro
Cryptography IntroCryptography Intro
Cryptography Intro
 
Cryptography
CryptographyCryptography
Cryptography
 
Visual Cryptography
Visual CryptographyVisual Cryptography
Visual Cryptography
 
Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare Nelson
Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare NelsonZero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare Nelson
Zero-Knowledge Proofs: Privacy-Preserving Digital Identity with Clare Nelson
 
Chapter 6 Presentation
Chapter 6 PresentationChapter 6 Presentation
Chapter 6 Presentation
 

Similar to Engineering data privacy - The ARX data anonymization tool

Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymizationarx-deidentifier
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG
 
A review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsA review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsReza Sadeghi
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
 
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data MiningDevelopment of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data MiningEditor IJCATR
 
Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...ijfcstjournal
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...arx-deidentifier
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...JaresJournal
 
Prof. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicineProf. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicinemntbs1
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray dataGianluca Bontempi
 
An Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisAn Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisIOSR Journals
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medicalNexgen Technology
 
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...cscpconf
 

Similar to Engineering data privacy - The ARX data anonymization tool (20)

Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligence
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
 
A review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsA review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signals
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
 
Hanaa phd presentation 14-4-2017
Hanaa phd  presentation  14-4-2017Hanaa phd  presentation  14-4-2017
Hanaa phd presentation 14-4-2017
 
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data MiningDevelopment of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
 
Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
 
Prof. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicineProf. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicine
 
TBerger_FinalReport
TBerger_FinalReportTBerger_FinalReport
TBerger_FinalReport
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
An Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisAn Overview on Gene Expression Analysis
An Overview on Gene Expression Analysis
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medical
 
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
 

More from arx-deidentifier

An Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-IdentificationAn Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-Identificationarx-deidentifier
 
Anonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXAnonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXarx-deidentifier
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier
 
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health DataARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health Dataarx-deidentifier
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 

More from arx-deidentifier (6)

An Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-IdentificationAn Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-Identification
 
Anonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXAnonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARX
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithms
 
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health DataARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 

Recently uploaded

Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 

Recently uploaded (20)

Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 

Engineering data privacy - The ARX data anonymization tool

  • 1. Technische Universität München Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn Chair for Biomedical Informatics Institute for Medical Statistics and Epidemiologie University of Technology Munich (TUM) Engineering data privacy - The ARX data anonymization tool
  • 2. Technische Universität München What is ARX? ● = + ● A tool for analyzing and reducing the uniqueness of records in a (relational) dataset ● Variety of methods ● Highly scalable ● Up to 50 dimensions (i.e. attributes) ● Millions of records ● (Semi-)automatically and/or manually ● Comprehensive graphical user interface ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 2 Images: https://commons.wikimedia.org/ users: Ysangkok, Scarce2 statistics computer science Methods from
  • 3. Technische Universität München Example ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 3 Generalization Suppression Microaggregation Reduce uniqueness
  • 4. Technische Universität München Overview of methods implemented by ARX Sample-based methods • Fraction of sample uniques • Average sample uniqueness • k-anonymity Population-based methods • Model by Zayatz [1] • Model by Hoshino [2] • Model by Chen et al. [3] / Rinott [4] • Model by Dankar et al. [5] ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 4 [1] Zayatz, L.V.: Estimation of the percent of unique population elements on a microdata file using the sample. Statistical Research Division Report Number: Census/SRD/RR-91/08 (1991) [2] Hoshino, N.: Applying pitmans sampling formula to microdata disclosure risk assessment. J Off Stat 17(4), 499520 (2001) [3] Chen, G., Keller-McNulty, S.: Estimation of identification disclosure risk in microdata. J Off Stat 14, 7995 (1998) [4] Rinott, Y.: On models for statistical disclosure risk estimation. In: Proc ECE/Eurostat Work Session Stat Data Confid, p. 275285 (2003) [5] Dankar, F., Emam, K.E., Neisa, A., Roffey, T.: Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak 12(1), 66 (2012) Global and local recoding • Can be weighted Methods • Categorization • Generalization • Cell suppression • Record suppression • Micro-aggregation • Top/bottom coding Weighted and parameterized • Ability to control the application of different coding models Methods • AECS, Discernibility, Precision • (Normalized) Mean squared error • (Normalized) Non-uniform entropy • KL divergence • Loss Measures for utility Coding models Measures for uniqueness Transform Visualize Analyze Adapt
  • 5. Technische Universität München Screenshots ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 5
  • 6. Technische Universität München Screenshots (cont'd) ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 6
  • 7. Technische Universität München Further features offered by ARX ● Syntactic privacy models ● ℓ-diversity, t-closeness, δ-disclosure privacy, δ-presence ● Risk-based anonymization ● Differential privacy ● Truthful (e,δ)-differentially private data release ● Using random sampling ● Detection of HIPAA identifiers ● Based on heuristics ● Import from multiple sources ● RDBMS, Excel, CSV ● Software library ● Open source, cross-platform ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 7