SlideShare a Scribd company logo
Libra: High-Utility
Anonymization of Event Logs
for Process Mining via
Subsampling
Gamal Elkoumy and Marlon Dumas
University of Tartu
ICPM 22
This Photo by Unknown author is licensed under CC BY-ND.
Event Log
2
ID Name Activity Timestamp Age Sex Zip Disease
1 Marco Montali Register 07.01.2020-08:30 37 M 13053 Flu
1 Marco Montali Visit 07.01.2020-08:45 37 M 13053 Flu
2 Fabrizio Maggi Register 07.01.2020-08:46 35 M 51009 Infection
1 Marco Montali Blood Test 07.01.2020-08:57 37 M 13053 Flu
1 Marco Montali Discharge 07.01.2020-08:58 37 M 13053 Flu
2 Fabrizio Maggi Hospitalize 07.01.2020-09:01 35 M 51009 Infection
2 Fabrizio Maggi Blood Test 07.01.2020-10:30 35 M 51009 Infection
2 Fabrizio Maggi Visit 07.02.2020-09:35 35 M 51009 Infection
2 Fabrizio Maggi Discharge 07.02.2020-14:00 35 M 51009 Infection
Motivation- GDPR
3
Singling Out
Singling Out
Marco Montali checked-in at
8:30 AM and his blood test
took 1 minute.
10 patients took Blood Tests
on that day, and on average a
test takes 2 minutes.
5
Releasing The Log
Releasing The Log
Releasing The Log
Releasing The Log
Attack Model
The attacker has a goal h(L) to infer information
from a log. We consider the following goals:
• ℎ1 : Determine if an individual is in the log
through their execution flow.
• ℎ2: Determine the execution time of an activity.
10
Differential Privacy
• Mitigates linkage attacks
including past, present, and
future releases of datasets.
• Provides quantification for
privacy loss.
• Provides the composition of
multiple simple mechanisms
into a bigger one.
11
Research Question
• Given an event log L, wherein each trace contains
private information about an individual (e.g., a
customer),
• and given a privacy budget ε,
• generate an anonymized event log L’ that provides
an ε-differential privacy guarantee to each
individual represented in the log.
12
Proposal – Privacy Amplification
• Differential privacy guarantees can be
amplified by applying DP to a small
random subsample of records.
13
Proposal
• We use privacy amplification to
achieve a lower utility loss relative
to classic DP-anonymization
techniques for a given level of
privacy.
• We use the composition property
of differential privacy to compose
the separately anonymized
subsamples to establish the final
anonymized log.
14
Approach
15
• We filter rare trace variants.
• Rare trace variants are the variants
that are executed for a group of few
individuals.
• Observing such traces may increase
the attacker’s confidence about this
group of individuals.
• We remove trace variants that
occurs < C.
16
• We perform Poisson subsampling to
achieve privacy amplification.
17
• We can use any of the DP
approaches in the literature.
• We use the approach presented
at ICPM21.
• “Mine Me but Don’t Single Me
Out: Differentially Private Event
Logs for Process Mining”
18
• We anonymize the case variants
of the log by means of
over/under-sampling.
• We anonymize the start time of
a case by displacing it left or
right according to a Laplacian
distribution.
19
• We select the statistically
significant traces out of the
anonymized subsamples to
provide a higher utility of the DP
event log.
• We adopt the statistically
significant sampling presented by
Bauer et al 2018.
• Note: DP guarantees are still
preserved. (We use differentially-
private post-processing).
20
• We combine the anonymized
subsamples to construct the
anonymized log.
• Note: DP gives a quantification
of the privacy guarantees after
the composition.
• We use Renyi DP to estimate
the composition privacy
guarantees.
21
Empirical Evaluation
• We measure the distance between the anonymized log and
the original log.
• The distance is quantified as the Earth mover’s distance
between the DFG of the anonymized log and the original log.
Empirical Evaluation
• We evaluate the approach using 8 real-world event logs.
• We compare the approach to the state-of-the-art.
• All the anonymized logs are publicly available at
https://doi.org/10.5281/zenodo.6376761.
Empirical Evaluation
24
Summary
• In this paper, we used the different properties of differential privacy
to enable high utility anonymization.
• We have used privacy amplification to provide the same privacy
guarantees while reducing the noise.
• We have used the differentially-private post-processing to select the
statistically significant traces which increased the utility.
• We have used the composition to combine the anonymized
subsamples.
Thank you for attending!
Questions?

More Related Content

Similar to Elkoumy - Libra - ICPM22.pptx

Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...
Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...
Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...
IRJET Journal
 
Implementation of Steganographic Techniques and its Detection.
Implementation of Steganographic Techniques and its Detection.Implementation of Steganographic Techniques and its Detection.
Implementation of Steganographic Techniques and its Detection.
IRJET Journal
 
Bh32379384
Bh32379384Bh32379384
Bh32379384
IJERA Editor
 
Steven_Cheuk_Final_Report
Steven_Cheuk_Final_ReportSteven_Cheuk_Final_Report
Steven_Cheuk_Final_Report
Steven Moran
 
V01 i010411
V01 i010411V01 i010411
V01 i010411
IJARBEST JOURNAL
 
Network Intelligence Driven Human Behavior Modeling
Network Intelligence Driven Human Behavior ModelingNetwork Intelligence Driven Human Behavior Modeling
Network Intelligence Driven Human Behavior Modeling
Fahim Kawsar
 
Ijnsa050213
Ijnsa050213Ijnsa050213
Ijnsa050213
IJNSA Journal
 
SYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTS
SYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTSSYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTS
SYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTS
acijjournal
 
In Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesIn Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business Processes
Marlon Dumas
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data Mining
IJMER
 
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Joerg Blumtritt
 
Secure Data Encryption and Authentication using Visual Cryptography in the TS...
Secure Data Encryption and Authentication using Visual Cryptography in the TS...Secure Data Encryption and Authentication using Visual Cryptography in the TS...
Secure Data Encryption and Authentication using Visual Cryptography in the TS...
AM Publications,India
 
Level Up Your Security Skills in Splunk Enterprise
Level Up Your Security Skills in Splunk EnterpriseLevel Up Your Security Skills in Splunk Enterprise
Level Up Your Security Skills in Splunk Enterprise
Splunk
 
IRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming Machine
IRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming MachineIRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming Machine
IRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming Machine
IRJET Journal
 
tv_27_2020_6_1741-1747.pdf
tv_27_2020_6_1741-1747.pdftv_27_2020_6_1741-1747.pdf
tv_27_2020_6_1741-1747.pdf
ssuser50a5ec
 
Template Protection with Homomorphic Encryption
Template Protection with Homomorphic EncryptionTemplate Protection with Homomorphic Encryption
Template Protection with Homomorphic Encryption
Tolun Tosun
 
D010312230
D010312230D010312230
D010312230
IOSR Journals
 
Towards Statistical Queries over Distributed Private User Data
Towards Statistical Queries over Distributed Private User Data Towards Statistical Queries over Distributed Private User Data
Towards Statistical Queries over Distributed Private User Data
Serafeim Chatzopoulos
 
Privacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive DataPrivacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive Data
paperpublications3
 
Data Allocation Strategies for Leakage Detection
Data Allocation Strategies for Leakage DetectionData Allocation Strategies for Leakage Detection
Data Allocation Strategies for Leakage Detection
IOSR Journals
 

Similar to Elkoumy - Libra - ICPM22.pptx (20)

Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...
Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...
Social Distance Detector Using Computer Vision, OpenCV and YOLO Deep Learning...
 
Implementation of Steganographic Techniques and its Detection.
Implementation of Steganographic Techniques and its Detection.Implementation of Steganographic Techniques and its Detection.
Implementation of Steganographic Techniques and its Detection.
 
Bh32379384
Bh32379384Bh32379384
Bh32379384
 
Steven_Cheuk_Final_Report
Steven_Cheuk_Final_ReportSteven_Cheuk_Final_Report
Steven_Cheuk_Final_Report
 
V01 i010411
V01 i010411V01 i010411
V01 i010411
 
Network Intelligence Driven Human Behavior Modeling
Network Intelligence Driven Human Behavior ModelingNetwork Intelligence Driven Human Behavior Modeling
Network Intelligence Driven Human Behavior Modeling
 
Ijnsa050213
Ijnsa050213Ijnsa050213
Ijnsa050213
 
SYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTS
SYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTSSYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTS
SYMMETRIC-KEY BASED PRIVACYPRESERVING SCHEME FOR MINING SUPPORT COUNTS
 
In Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesIn Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business Processes
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data Mining
 
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
 
Secure Data Encryption and Authentication using Visual Cryptography in the TS...
Secure Data Encryption and Authentication using Visual Cryptography in the TS...Secure Data Encryption and Authentication using Visual Cryptography in the TS...
Secure Data Encryption and Authentication using Visual Cryptography in the TS...
 
Level Up Your Security Skills in Splunk Enterprise
Level Up Your Security Skills in Splunk EnterpriseLevel Up Your Security Skills in Splunk Enterprise
Level Up Your Security Skills in Splunk Enterprise
 
IRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming Machine
IRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming MachineIRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming Machine
IRJET-Design and Fabrication of Automatic Plastic Cup Thermoforming Machine
 
tv_27_2020_6_1741-1747.pdf
tv_27_2020_6_1741-1747.pdftv_27_2020_6_1741-1747.pdf
tv_27_2020_6_1741-1747.pdf
 
Template Protection with Homomorphic Encryption
Template Protection with Homomorphic EncryptionTemplate Protection with Homomorphic Encryption
Template Protection with Homomorphic Encryption
 
D010312230
D010312230D010312230
D010312230
 
Towards Statistical Queries over Distributed Private User Data
Towards Statistical Queries over Distributed Private User Data Towards Statistical Queries over Distributed Private User Data
Towards Statistical Queries over Distributed Private User Data
 
Privacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive DataPrivacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive Data
 
Data Allocation Strategies for Leakage Detection
Data Allocation Strategies for Leakage DetectionData Allocation Strategies for Leakage Detection
Data Allocation Strategies for Leakage Detection
 

Recently uploaded

International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 

Recently uploaded (20)

International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 

Elkoumy - Libra - ICPM22.pptx

  • 1. Libra: High-Utility Anonymization of Event Logs for Process Mining via Subsampling Gamal Elkoumy and Marlon Dumas University of Tartu ICPM 22 This Photo by Unknown author is licensed under CC BY-ND.
  • 2. Event Log 2 ID Name Activity Timestamp Age Sex Zip Disease 1 Marco Montali Register 07.01.2020-08:30 37 M 13053 Flu 1 Marco Montali Visit 07.01.2020-08:45 37 M 13053 Flu 2 Fabrizio Maggi Register 07.01.2020-08:46 35 M 51009 Infection 1 Marco Montali Blood Test 07.01.2020-08:57 37 M 13053 Flu 1 Marco Montali Discharge 07.01.2020-08:58 37 M 13053 Flu 2 Fabrizio Maggi Hospitalize 07.01.2020-09:01 35 M 51009 Infection 2 Fabrizio Maggi Blood Test 07.01.2020-10:30 35 M 51009 Infection 2 Fabrizio Maggi Visit 07.02.2020-09:35 35 M 51009 Infection 2 Fabrizio Maggi Discharge 07.02.2020-14:00 35 M 51009 Infection
  • 5. Singling Out Marco Montali checked-in at 8:30 AM and his blood test took 1 minute. 10 patients took Blood Tests on that day, and on average a test takes 2 minutes. 5
  • 10. Attack Model The attacker has a goal h(L) to infer information from a log. We consider the following goals: • ℎ1 : Determine if an individual is in the log through their execution flow. • ℎ2: Determine the execution time of an activity. 10
  • 11. Differential Privacy • Mitigates linkage attacks including past, present, and future releases of datasets. • Provides quantification for privacy loss. • Provides the composition of multiple simple mechanisms into a bigger one. 11
  • 12. Research Question • Given an event log L, wherein each trace contains private information about an individual (e.g., a customer), • and given a privacy budget ε, • generate an anonymized event log L’ that provides an ε-differential privacy guarantee to each individual represented in the log. 12
  • 13. Proposal – Privacy Amplification • Differential privacy guarantees can be amplified by applying DP to a small random subsample of records. 13
  • 14. Proposal • We use privacy amplification to achieve a lower utility loss relative to classic DP-anonymization techniques for a given level of privacy. • We use the composition property of differential privacy to compose the separately anonymized subsamples to establish the final anonymized log. 14
  • 16. • We filter rare trace variants. • Rare trace variants are the variants that are executed for a group of few individuals. • Observing such traces may increase the attacker’s confidence about this group of individuals. • We remove trace variants that occurs < C. 16
  • 17. • We perform Poisson subsampling to achieve privacy amplification. 17
  • 18. • We can use any of the DP approaches in the literature. • We use the approach presented at ICPM21. • “Mine Me but Don’t Single Me Out: Differentially Private Event Logs for Process Mining” 18
  • 19. • We anonymize the case variants of the log by means of over/under-sampling. • We anonymize the start time of a case by displacing it left or right according to a Laplacian distribution. 19
  • 20. • We select the statistically significant traces out of the anonymized subsamples to provide a higher utility of the DP event log. • We adopt the statistically significant sampling presented by Bauer et al 2018. • Note: DP guarantees are still preserved. (We use differentially- private post-processing). 20
  • 21. • We combine the anonymized subsamples to construct the anonymized log. • Note: DP gives a quantification of the privacy guarantees after the composition. • We use Renyi DP to estimate the composition privacy guarantees. 21
  • 22. Empirical Evaluation • We measure the distance between the anonymized log and the original log. • The distance is quantified as the Earth mover’s distance between the DFG of the anonymized log and the original log.
  • 23. Empirical Evaluation • We evaluate the approach using 8 real-world event logs. • We compare the approach to the state-of-the-art. • All the anonymized logs are publicly available at https://doi.org/10.5281/zenodo.6376761.
  • 25. Summary • In this paper, we used the different properties of differential privacy to enable high utility anonymization. • We have used privacy amplification to provide the same privacy guarantees while reducing the noise. • We have used the differentially-private post-processing to select the statistically significant traces which increased the utility. • We have used the composition to combine the anonymized subsamples.
  • 26. Thank you for attending! Questions?

Editor's Notes

  1. The empirical evaluation shows that the privacy amplification effect leads to significant reductions of utility loss, particularly when it comes to anonymizing the frequency of distribution of case variants.