SlideShare a Scribd company logo
1 of 30
Download to read offline
Technische Universität München
Fabian Prasser, Florian Kohlmayer
Lehrstuhl für Medizinische Informatik
Institut für Medizinische Statistik und Epidemiologie
Klinikum rechts der Isar der TU München
ARX 3.0.0
A Comprehensive Tool for
Anonymizing Biomedical Data
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Towards useful data anonymization
• Usability has many dimensions
• Ability to balance data utility with privacy requirements
• Need to support a broad spectrum of methods
• Privacy models
• Transformation models
• Methods for analyzing data utility
• Methods for analyzing risks
• Further “non-functional” requirements
• Integrated and harmonized: ARX is not a “tool box”
• Compatibility (syntactic and semantic)
• Performance and scalability
• Intuitive visualization and parameterization
• Provide methods to end-users as well as programmers
19.03.2015 2
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Compatibility
• Built-in data import facilities
• Relational databases (MS SQL, DB2, SQLite, MySQL)
• MS Excel
• CSV (all common formats, auto-detection)
• Support for different data types and scales of measure
• Strings (with nominal and ordinal scale)
• Dates (interval scale)
• Numbers (ratio scale)
• Automatic detection of data types and formats
• Methods for handling and cleaning low-quality data
• Handles missing and invalid values correctly
• In privacy models, transformation methods, visualizations
• Manual removal of tuples, query interface, find & replace
19.03.2015 3
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Flexible transformation methods
• Global recoding
• Full-domain generalization
• Top- & bottom coding
• Local recoding
• Tuple suppression
• Fully integrated and parameterizable
• Importance of attributes, suitability of methods
• Functional representations of transformation rules
• Especially functional representations of hierarchies
• Support for categorical and continuous variables (categorization)
• Multiple methods for measuring data utility
• Parametrizable, e.g., with different aggregate functions
• Use functional representations of transformation rules
19.03.2015 4
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Multiple apriori privacy models
• k-Anonymity
• ℓ-Diversity (distinct, entropy, recursive-(c, ℓ))
• t-Closeness (equal and hierarchical ground distance)
• δ-Presence
• Multiple methods for risk-based anonymization
• Sample characteristics
• Average cell size
• Sample uniqueness
• Super-population models
• Decision rule by Dankar et al.
• Based on models by Pitman, Zayatz and the SNB model
• Support for arbitrary combinations of these models
• Optimal solution within our coding model
19.03.2015 5
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Scalability: ARX can handle large datasets (several million data
entries) on commodity hardware
• Efficient in-memory data management engine
• Works with compressed data representations
• Tight coupling between transformation operators and the „database
kernel”
• Provides a space-time trade-off
• Optimized search strategy: Based on multiple pruning strategies
• Efficient implementations of further complex tasks
• Evaluations of privacy criteria (e.g. t-closeness, δ-presence)
• Methods for solving non-linear equation systems
• Background jobs in the user interface
19.03.2015 6
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Comprehensive graphical interface
• Scalablity comparable to the ARX API
• Supports all methods provided by the ARX API
• Cross-platform (Windows, Linux/GTK, OSX) with native interfaces
• Available as binary distributions with installers
• Independent API
• User interface sits on top
of the API
• Java library
• All methods provided by ARX
are first-class citizens in both
worlds
19.03.2015 7
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Anonymization workflow
• Iterative process to successively refine transformations
• Supported by the scalability of our framework
• Three (repeating) steps, mapped to four perspectives
19.03.2015 8
- Define transformation model
- Define privacy model
- Define coding model
- Filter and analyze the
solution space
- Organize transformations
- Compare and analyze
input and output
- Regarding risks and
utility
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 9
ARX: Wizard for data import
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 10
ARX: Configuration (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 11
ARX: Configuration (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 12
ARX: Configuration (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 13
ARX: Configuration (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 14
ARX: Wizard for transformation rules
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 15
ARX: Further dialogs
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 16
ARX: Exploration (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 17
ARX: Exploration (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 18
ARX: Exploration (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 19
ARX: Utility analysis (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 20
ARX: Utility analysis (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 21
ARX: Utility analysis (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 22
ARX: Utility analysis (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 23
ARX: Utility analysis (5)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 24
ARX: Risk analysis (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 25
ARX: Risk analysis (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 26
ARX: Risk analysis (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 27
ARX: Risk analysis (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 28
ARX: Context-sensitive help
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
●
Current projects
●
Non-interactive Differential Privacy
●
Support for high-dimensional data: heuristic algorithms
●
Further analyses and visualizations: utility and risks
●
Support for transactional attributes: (k, km
)-anonymity
●
Integrated data masking methods
●
More flexible definition of quasi-identifiers
●
More risk models
●
Planned projects
●
Auto-detection of HIPAA identifiers
●
Implement more flexible privacy criteria
19.03.2015 29
ARX: Further developments
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
• ARX is open source software
• Contribute: feature requests, code reviews, criticism,
enhancements, questions
• Repository: https://github.com/arx-deidentifier/arx
• Further information & download: http://arx.deidentifier.org
• Get in touch
• Fabian Prasser (prasser@in.tum.de)
• Florian Kohlmayer (florian.kohlmayer@tum.de)
19.03.2015 30
• Any questions?
Thank you for your attention

More Related Content

What's hot

Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton ChuvakinFive Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton ChuvakinAnton Chuvakin
 
Machine Learning for Threat Detection
Machine Learning for Threat DetectionMachine Learning for Threat Detection
Machine Learning for Threat DetectionNapier University
 
How to build a cyber threat intelligence program
How to build a cyber threat intelligence programHow to build a cyber threat intelligence program
How to build a cyber threat intelligence programMark Arena
 
Her Yönü İle Siber Tehdit İstihbaratı
Her Yönü İle Siber Tehdit İstihbaratıHer Yönü İle Siber Tehdit İstihbaratı
Her Yönü İle Siber Tehdit İstihbaratıBGA Cyber Security
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Databricks
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
SIEM - Varolan Verilerin Anlamı
SIEM - Varolan Verilerin AnlamıSIEM - Varolan Verilerin Anlamı
SIEM - Varolan Verilerin AnlamıBGA Cyber Security
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
ウイルス検知プログラミング
ウイルス検知プログラミングウイルス検知プログラミング
ウイルス検知プログラミング黒 林檎
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1sakthyvel3
 
Learning Method In Data Mining
Learning Method In Data MiningLearning Method In Data Mining
Learning Method In Data Miningishaq zaman
 

What's hot (20)

Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton ChuvakinFive Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
Five Best and Five Worst Practices for SIEM by Dr. Anton Chuvakin
 
Data anonymization
Data anonymizationData anonymization
Data anonymization
 
Machine Learning for Threat Detection
Machine Learning for Threat DetectionMachine Learning for Threat Detection
Machine Learning for Threat Detection
 
How to build a cyber threat intelligence program
How to build a cyber threat intelligence programHow to build a cyber threat intelligence program
How to build a cyber threat intelligence program
 
Her Yönü İle Siber Tehdit İstihbaratı
Her Yönü İle Siber Tehdit İstihbaratıHer Yönü İle Siber Tehdit İstihbaratı
Her Yönü İle Siber Tehdit İstihbaratı
 
CrypTool: Cryptography for the masses
CrypTool: Cryptography for the massesCrypTool: Cryptography for the masses
CrypTool: Cryptography for the masses
 
Big data
Big dataBig data
Big data
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
SIEM Architecture
SIEM ArchitectureSIEM Architecture
SIEM Architecture
 
SIEM - Varolan Verilerin Anlamı
SIEM - Varolan Verilerin AnlamıSIEM - Varolan Verilerin Anlamı
SIEM - Varolan Verilerin Anlamı
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
ウイルス検知プログラミング
ウイルス検知プログラミングウイルス検知プログラミング
ウイルス検知プログラミング
 
Cloud Summit Canada com Rodrigo Montoro
Cloud Summit Canada com Rodrigo MontoroCloud Summit Canada com Rodrigo Montoro
Cloud Summit Canada com Rodrigo Montoro
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1
 
Learning Method In Data Mining
Learning Method In Data MiningLearning Method In Data Mining
Learning Method In Data Mining
 
Text Mining
Text MiningText Mining
Text Mining
 
Analysis of database tampering
Analysis of database tamperingAnalysis of database tampering
Analysis of database tampering
 

Similar to ARX - A Comprehensive Tool for Anonymizing Biomedical Data

ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...Juan Antonio Vizcaino
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)ijccmsjournal
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscapeRichard Cave
 
TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015Kees van Bochove
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)ijccmsjournal
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailPaulo Zanini
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsInnovation Agency
 

Similar to ARX - A Comprehensive Tool for Anonymizing Biomedical Data (20)

ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Principles and risk assessment of managing distributed ontologies hosted by e...
Principles and risk assessment of managing distributed ontologies hosted by e...Principles and risk assessment of managing distributed ontologies hosted by e...
Principles and risk assessment of managing distributed ontologies hosted by e...
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscape
 
TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics email
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
CIKB poster_2015
CIKB poster_2015CIKB poster_2015
CIKB poster_2015
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
1 3 introduction to open_ehr
1 3 introduction to open_ehr1 3 introduction to open_ehr
1 3 introduction to open_ehr
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
 

Recently uploaded

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

ARX - A Comprehensive Tool for Anonymizing Biomedical Data

  • 1. Technische Universität München Fabian Prasser, Florian Kohlmayer Lehrstuhl für Medizinische Informatik Institut für Medizinische Statistik und Epidemiologie Klinikum rechts der Isar der TU München ARX 3.0.0 A Comprehensive Tool for Anonymizing Biomedical Data
  • 2. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Towards useful data anonymization • Usability has many dimensions • Ability to balance data utility with privacy requirements • Need to support a broad spectrum of methods • Privacy models • Transformation models • Methods for analyzing data utility • Methods for analyzing risks • Further “non-functional” requirements • Integrated and harmonized: ARX is not a “tool box” • Compatibility (syntactic and semantic) • Performance and scalability • Intuitive visualization and parameterization • Provide methods to end-users as well as programmers 19.03.2015 2
  • 3. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Compatibility • Built-in data import facilities • Relational databases (MS SQL, DB2, SQLite, MySQL) • MS Excel • CSV (all common formats, auto-detection) • Support for different data types and scales of measure • Strings (with nominal and ordinal scale) • Dates (interval scale) • Numbers (ratio scale) • Automatic detection of data types and formats • Methods for handling and cleaning low-quality data • Handles missing and invalid values correctly • In privacy models, transformation methods, visualizations • Manual removal of tuples, query interface, find & replace 19.03.2015 3
  • 4. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Flexible transformation methods • Global recoding • Full-domain generalization • Top- & bottom coding • Local recoding • Tuple suppression • Fully integrated and parameterizable • Importance of attributes, suitability of methods • Functional representations of transformation rules • Especially functional representations of hierarchies • Support for categorical and continuous variables (categorization) • Multiple methods for measuring data utility • Parametrizable, e.g., with different aggregate functions • Use functional representations of transformation rules 19.03.2015 4
  • 5. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Multiple apriori privacy models • k-Anonymity • ℓ-Diversity (distinct, entropy, recursive-(c, ℓ)) • t-Closeness (equal and hierarchical ground distance) • δ-Presence • Multiple methods for risk-based anonymization • Sample characteristics • Average cell size • Sample uniqueness • Super-population models • Decision rule by Dankar et al. • Based on models by Pitman, Zayatz and the SNB model • Support for arbitrary combinations of these models • Optimal solution within our coding model 19.03.2015 5
  • 6. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Scalability: ARX can handle large datasets (several million data entries) on commodity hardware • Efficient in-memory data management engine • Works with compressed data representations • Tight coupling between transformation operators and the „database kernel” • Provides a space-time trade-off • Optimized search strategy: Based on multiple pruning strategies • Efficient implementations of further complex tasks • Evaluations of privacy criteria (e.g. t-closeness, δ-presence) • Methods for solving non-linear equation systems • Background jobs in the user interface 19.03.2015 6
  • 7. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Comprehensive graphical interface • Scalablity comparable to the ARX API • Supports all methods provided by the ARX API • Cross-platform (Windows, Linux/GTK, OSX) with native interfaces • Available as binary distributions with installers • Independent API • User interface sits on top of the API • Java library • All methods provided by ARX are first-class citizens in both worlds 19.03.2015 7
  • 8. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Anonymization workflow • Iterative process to successively refine transformations • Supported by the scalability of our framework • Three (repeating) steps, mapped to four perspectives 19.03.2015 8 - Define transformation model - Define privacy model - Define coding model - Filter and analyze the solution space - Organize transformations - Compare and analyze input and output - Regarding risks and utility
  • 9. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 9 ARX: Wizard for data import
  • 10. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 10 ARX: Configuration (1)
  • 11. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 11 ARX: Configuration (2)
  • 12. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 12 ARX: Configuration (3)
  • 13. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 13 ARX: Configuration (4)
  • 14. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 14 ARX: Wizard for transformation rules
  • 15. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 15 ARX: Further dialogs
  • 16. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 16 ARX: Exploration (1)
  • 17. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 17 ARX: Exploration (2)
  • 18. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 18 ARX: Exploration (3)
  • 19. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 19 ARX: Utility analysis (1)
  • 20. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 20 ARX: Utility analysis (2)
  • 21. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 21 ARX: Utility analysis (3)
  • 22. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 22 ARX: Utility analysis (4)
  • 23. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 23 ARX: Utility analysis (5)
  • 24. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 24 ARX: Risk analysis (1)
  • 25. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 25 ARX: Risk analysis (2)
  • 26. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 26 ARX: Risk analysis (3)
  • 27. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 27 ARX: Risk analysis (4)
  • 28. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 28 ARX: Context-sensitive help
  • 29. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ● Current projects ● Non-interactive Differential Privacy ● Support for high-dimensional data: heuristic algorithms ● Further analyses and visualizations: utility and risks ● Support for transactional attributes: (k, km )-anonymity ● Integrated data masking methods ● More flexible definition of quasi-identifiers ● More risk models ● Planned projects ● Auto-detection of HIPAA identifiers ● Implement more flexible privacy criteria 19.03.2015 29 ARX: Further developments
  • 30. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data • ARX is open source software • Contribute: feature requests, code reviews, criticism, enhancements, questions • Repository: https://github.com/arx-deidentifier/arx • Further information & download: http://arx.deidentifier.org • Get in touch • Fabian Prasser (prasser@in.tum.de) • Florian Kohlmayer (florian.kohlmayer@tum.de) 19.03.2015 30 • Any questions? Thank you for your attention