SlideShare a Scribd company logo
1 of 9
Download to read offline
Fabian Prasser1
, James Gaupp2
, Zhiyu Wan2
,
Weiyi Xia2
, Yevgeniy Vorobeychik2
,
Murat Kantarcioglu3
, Klaus Kuhn1
, Brad Malin2
1
Technical University of Munich, Munich, Germany
2
Vanderbilt University, Nashville, Tennessee, USA
3
University of Texas at Dallas, Richardson, Texas, USA
An Open Source Tool for Game Theoretic
Health Data De-Identification
Session #35: Oral Presentations - Privacy & De-identification
Motivation
Modern approaches to medical research
• Big data: high case numbers, detailed characterizations
• Re-use and sharing of data
Initiatives to improve reproducibility of research and availability of data
• NIH Statement on Sharing Research Data, Notice NOT-OD-03-032; 2003
• NIH Genomic Data Sharing Policy, Notice NOT-OD-14-124; 2014
• EMA Policy 0070 on Publication of Clinical Data for Medicinal Products for
Human Use; 2014
De-identification is complex; involves many aspects
• Comprehensive and versatile tools are needed
2AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
De-Identification of Structured Data
3
Generalization
Deletion
Aggregation
Sampling
Categorization
Masking
AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
The Risk-Utility Trade-Off
Two conflicting objectives need to be balanced
• Minimization of re-identification risks
• Minimization of reductions in data utility
Objectives need to be quantified
• Risk: several models, e.g. based on uniqueness
• Utility: several models, e.g. based on entropy
Traditional solution: simplification
• Privacy model: implements risk threshold
• Maximization of output data utility
4
Re-identification risk
Original data
Highest risk
No data
Zero risk
AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
Q
uality
assessm
ent
ARX Data Anonymization Tool
5
C
onfiguration
Exploration
R
isk
analysis
Comprehensive
• Many risk and
transformation
models
• (Almost) arbitrary
combinations
Scalable
• Millions of records
Cross-platform
• GUI and library
Open source
• Apache 2
Semi-automated
• Four perspectives
AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
Game Theoretic Data De-Identification
6
Optimization function Re-identification risk = success probability
Re-identification risk = success probability
Stackelberg game models re-identification from an economic perspective
• Publishers gain benefit from sharing data at a certain degree of utility
• Publishers lose money for successful re-identification attacks
• Attackers gain benefit from re-identifying records
• Attackers need to bear costs for launching a re-identification attack
Publisher payout can be maximized by assuming a rational adversary
AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
[Wan, et al. PLOS ONE. 2015]
Information loss = reduction in utility
Objectives, Challenges and Results
Objective: to integrate the game theoretic model into ARX
Challenge 1: to transform the approach into ARX’s de-identification model
Challenge 2: to achieve scalability
Result: full integration into ARX
Enables new methods of solving the game
7AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
Captures profitability: introduces
a threshold at zero payout
Summarizes overall payout
excluding non-profitable records
Privacy model Utility model
Also: Specific variants of the game can be solved by using traditional privacy models
Solutions can be pruned from the search process by exploiting properties of
publisher payout under the assumption that the adversary never attacks
Evaluation: Overall Publisher Payout
8
Full-domain
generalization
Record-level
generalization
Realistic
Results
●
The game theoretic model
typically outperforms HIPAA
Safe Harbor
●
Implementation details are less
important when (a) a population
table is used or (b) the benefit of
publishing is significantly higher
than the potential loss
AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
Setup
●
~32k U.S. Census records
●
Demographic attributes
●
Adversary cost: $4
●
Publisher benefit: $1,200
Population-based risk Dataset-based risk Safe Harbor
Disadvantageous scenario
Thank you!
fabian.prasser@tum.de
arx.deidentifier.org

More Related Content

What's hot

Analyzing data (chapter 9)
Analyzing data (chapter 9)Analyzing data (chapter 9)
Analyzing data (chapter 9)
Humbertovsky
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
jaya lakshmi
 

What's hot (19)

Analyzing data (chapter 9)
Analyzing data (chapter 9)Analyzing data (chapter 9)
Analyzing data (chapter 9)
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data mining
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical Publications
 
Multiple imputation of missing data
Multiple imputation of missing dataMultiple imputation of missing data
Multiple imputation of missing data
 
CV
CVCV
CV
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraData Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
 
Finding and Accessing Diagrams in Biomedical Publications
Finding and Accessing Diagrams in Biomedical PublicationsFinding and Accessing Diagrams in Biomedical Publications
Finding and Accessing Diagrams in Biomedical Publications
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
Data Collection
Data CollectionData Collection
Data Collection
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Qualitative Data Analysis
Qualitative Data AnalysisQualitative Data Analysis
Qualitative Data Analysis
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
QuahogLife | Solutions and Services
QuahogLife | Solutions and ServicesQuahogLife | Solutions and Services
QuahogLife | Solutions and Services
 

Similar to An Open Source Tool for Game Theoretic Health Data De-Identification

Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06
Joe Gricar, MS
 
iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...
iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...
iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...
Health IT Conference – iHT2
 
4222020 Originality Reporthttpsucumberlands.blackboar.docx
4222020 Originality Reporthttpsucumberlands.blackboar.docx4222020 Originality Reporthttpsucumberlands.blackboar.docx
4222020 Originality Reporthttpsucumberlands.blackboar.docx
taishao1
 

Similar to An Open Source Tool for Game Theoretic Health Data De-Identification (20)

EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06
 
iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...
iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...
iHT² Health IT Summit Seattle 2013 - Josephine Briggs, MD, National Center fo...
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
 
Enabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Enabling Analytics on Sensitive Medical Data with Secure Multiparty ComputationEnabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
Enabling Analytics on Sensitive Medical Data with Secure Multiparty Computation
 
Machine Learning Algorithms for Predictive Analytics in Precision Medicine
Machine Learning Algorithms for Predictive Analytics in Precision MedicineMachine Learning Algorithms for Predictive Analytics in Precision Medicine
Machine Learning Algorithms for Predictive Analytics in Precision Medicine
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Data Mining in Health Care
Data Mining in Health CareData Mining in Health Care
Data Mining in Health Care
 
Analytics in Action - Health
Analytics in Action - HealthAnalytics in Action - Health
Analytics in Action - Health
 
Growing Health Analytics Without Hiring new Staff
Growing Health Analytics Without Hiring new StaffGrowing Health Analytics Without Hiring new Staff
Growing Health Analytics Without Hiring new Staff
 
Analytics_Wealth of Infomation
Analytics_Wealth of InfomationAnalytics_Wealth of Infomation
Analytics_Wealth of Infomation
 
TiReX pitch deck
TiReX pitch deckTiReX pitch deck
TiReX pitch deck
 
4222020 Originality Reporthttpsucumberlands.blackboar.docx
4222020 Originality Reporthttpsucumberlands.blackboar.docx4222020 Originality Reporthttpsucumberlands.blackboar.docx
4222020 Originality Reporthttpsucumberlands.blackboar.docx
 
Prediction of Diabetes using Probability Approach
Prediction of Diabetes using Probability ApproachPrediction of Diabetes using Probability Approach
Prediction of Diabetes using Probability Approach
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes
 
Trustworthiness of AI based predictions Aachen 2024
Trustworthiness of AI based predictions Aachen 2024Trustworthiness of AI based predictions Aachen 2024
Trustworthiness of AI based predictions Aachen 2024
 

Recently uploaded

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 

An Open Source Tool for Game Theoretic Health Data De-Identification

  • 1. Fabian Prasser1 , James Gaupp2 , Zhiyu Wan2 , Weiyi Xia2 , Yevgeniy Vorobeychik2 , Murat Kantarcioglu3 , Klaus Kuhn1 , Brad Malin2 1 Technical University of Munich, Munich, Germany 2 Vanderbilt University, Nashville, Tennessee, USA 3 University of Texas at Dallas, Richardson, Texas, USA An Open Source Tool for Game Theoretic Health Data De-Identification Session #35: Oral Presentations - Privacy & De-identification
  • 2. Motivation Modern approaches to medical research • Big data: high case numbers, detailed characterizations • Re-use and sharing of data Initiatives to improve reproducibility of research and availability of data • NIH Statement on Sharing Research Data, Notice NOT-OD-03-032; 2003 • NIH Genomic Data Sharing Policy, Notice NOT-OD-14-124; 2014 • EMA Policy 0070 on Publication of Clinical Data for Medicinal Products for Human Use; 2014 De-identification is complex; involves many aspects • Comprehensive and versatile tools are needed 2AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
  • 3. De-Identification of Structured Data 3 Generalization Deletion Aggregation Sampling Categorization Masking AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
  • 4. The Risk-Utility Trade-Off Two conflicting objectives need to be balanced • Minimization of re-identification risks • Minimization of reductions in data utility Objectives need to be quantified • Risk: several models, e.g. based on uniqueness • Utility: several models, e.g. based on entropy Traditional solution: simplification • Privacy model: implements risk threshold • Maximization of output data utility 4 Re-identification risk Original data Highest risk No data Zero risk AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
  • 5. Q uality assessm ent ARX Data Anonymization Tool 5 C onfiguration Exploration R isk analysis Comprehensive • Many risk and transformation models • (Almost) arbitrary combinations Scalable • Millions of records Cross-platform • GUI and library Open source • Apache 2 Semi-automated • Four perspectives AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification
  • 6. Game Theoretic Data De-Identification 6 Optimization function Re-identification risk = success probability Re-identification risk = success probability Stackelberg game models re-identification from an economic perspective • Publishers gain benefit from sharing data at a certain degree of utility • Publishers lose money for successful re-identification attacks • Attackers gain benefit from re-identifying records • Attackers need to bear costs for launching a re-identification attack Publisher payout can be maximized by assuming a rational adversary AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification [Wan, et al. PLOS ONE. 2015] Information loss = reduction in utility
  • 7. Objectives, Challenges and Results Objective: to integrate the game theoretic model into ARX Challenge 1: to transform the approach into ARX’s de-identification model Challenge 2: to achieve scalability Result: full integration into ARX Enables new methods of solving the game 7AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification Captures profitability: introduces a threshold at zero payout Summarizes overall payout excluding non-profitable records Privacy model Utility model Also: Specific variants of the game can be solved by using traditional privacy models Solutions can be pruned from the search process by exploiting properties of publisher payout under the assumption that the adversary never attacks
  • 8. Evaluation: Overall Publisher Payout 8 Full-domain generalization Record-level generalization Realistic Results ● The game theoretic model typically outperforms HIPAA Safe Harbor ● Implementation details are less important when (a) a population table is used or (b) the benefit of publishing is significantly higher than the potential loss AMIA 2017 | Prasser et al.: An Open Source Tool for Game Theoretic Health Data De-Identification Setup ● ~32k U.S. Census records ● Demographic attributes ● Adversary cost: $4 ● Publisher benefit: $1,200 Population-based risk Dataset-based risk Safe Harbor Disadvantageous scenario