SlideShare a Scribd company logo
Mine Me but Don’t
Single Me out:
Differentially
Private Event Logs
for Process Mining
Gamal Elkoumy, Alisa Pankova, Marlon Dumas
University of Tartu and Cybernetica
Estonia
Event Log
Privacy
Threats
John Doe checked-in at 8:33 AM
and had a surgery from 8:35 AM to
9:25 AM.
On average, the activity "surgery"
happens between 8 AM and 10
AM with an execution time
between 30 minutes and 2 hours.
GDPR
Motivation
• Masking (pseudonymization)
is not enough.
• An attacker can use prefixes
or suffixes of the John’s trace
to identify him.
• Also, the attacker can use
the event timestamp to
identify John.
Attack Model
The attacker has a goal ℎ 𝐿 that captures their
interest in an event log 𝐿. We specifically
consider the following attacker’s goals:
• ℎ1: Has the individual been through a specific
sub-trace (prefix or suffix)? The output is a
bit with a value ϵ 0, 1 that represents yes or
no.
• ℎ2 : What is the execution time of a
particular activity that has been executed for
the individual? The output is a real value to
be guessed with precision.
Motivation
• We do not want the attacker to guess John Doe after
releasing the event log.
• We do not want that guessing probability to increase by a
certain amount after the event log release.
• We use this guessing advantage probability (δ) to
anonymize event logs.
Problem Statement
• Given an event log L, and given a
maximum level of acceptable guessing
advantage δ , generate an anonymized
event log L' such that the probability of
singling out an individual after
publishing L' does not increase by more
than δ .
Differential Privacy
ε
Approach
Approach – DAFSA
• Group events that go through the same prefixes/suffixes.
• Minimal Grouping shared prefixes/suffixes.
• Lossless representation of event log.
• Oversampling size estimation.
• Timestamp anonymization
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
Approach – Event Log Annotation
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
Approach – ε estimation
• The effect of ε differs based on the range of values
• Two ε values:
• ε for trace variants anonymization.
• ε for timestamp anonymization.
• We estimate different ε for each event based on the
distribution of values.
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
ε estimation – Personalized Differential Privacy
Input: δ=0.2
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
ε estimation – Personalized Differential Privacy
ε= 0.81 for δ=0.2
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
Approach - Oversampling
• The set of case variants is the main input used by process
mining techniques, e.g., conformance checking.
• In this setting, having the same set of case variants is
critical.
• Adding new case variants increases the false positives.
• Removing existing case variants increases the false
negatives.
• We adopt Oversampling over the DAFSA transition to
prevent singling out an individual using their prefix/suffix.
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
Approach - Oversampling
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
Approach – Time Noise Injection
Annotate
Event Log with
DAFSA states
ε estimation
for relative
time
ε estimation
for trace
variants
Oversampling
Cases
Noise Injection
to Timestamps
Anonymized Log
Empirical
Evaluation
We selected 14 publicly available event
logs.
What is the effect of choosing a privacy
level δ on the time dilation of the
anonymized event log?
What is the effect of choosing a privacy
level σ on the case variant distribution
of the anonymized event log?
Empirical Evaluation
Empirical Evaluation
• We proposed a concept of differentially private
event log and a mechanism to compute such logs.
• A differentially private event log limits the increase
in the probability that an attacker may single out
an individual based on the prefixes or suffixes of
the traces in the log and the timestamps of each
event.
Conclusion
and Future
Work
Conclusion
and Future
Work
A limitation is that the proposed method
introduces high levels of noise in the presence of
unique traces or temporal outliers. We plan to
investigate an approach where high-risk traces
are suppressed so that the amount of injected
noise into the remaining traces is lower.
A second future research avenue is to consider
anonymizing other columns in the event log, e.g.,
resources.
Questions

More Related Content

Similar to Mine me but don't single me out ICPM21

Logs vs Insiders
Logs vs InsidersLogs vs Insiders
Logs vs Insiders
Anton Chuvakin
 
Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurity
stelligence
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
tksakaki
 
Logs for Information Assurance and Forensics @ USMA
Logs for Information Assurance and Forensics @ USMALogs for Information Assurance and Forensics @ USMA
Logs for Information Assurance and Forensics @ USMA
Anton Chuvakin
 
Enabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responseEnabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident response
jeffmcjunkin
 
Elkoumy - Libra - ICPM22.pptx
Elkoumy - Libra - ICPM22.pptxElkoumy - Libra - ICPM22.pptx
Elkoumy - Libra - ICPM22.pptx
Gamal Elkoumy
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
Alex Pinto
 
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
Splunk
 

Similar to Mine me but don't single me out ICPM21 (8)

Logs vs Insiders
Logs vs InsidersLogs vs Insiders
Logs vs Insiders
 
Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurity
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
 
Logs for Information Assurance and Forensics @ USMA
Logs for Information Assurance and Forensics @ USMALogs for Information Assurance and Forensics @ USMA
Logs for Information Assurance and Forensics @ USMA
 
Enabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responseEnabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident response
 
Elkoumy - Libra - ICPM22.pptx
Elkoumy - Libra - ICPM22.pptxElkoumy - Libra - ICPM22.pptx
Elkoumy - Libra - ICPM22.pptx
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
 
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
 

Recently uploaded

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 

Recently uploaded (20)

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 

Mine me but don't single me out ICPM21

  • 1. Mine Me but Don’t Single Me out: Differentially Private Event Logs for Process Mining Gamal Elkoumy, Alisa Pankova, Marlon Dumas University of Tartu and Cybernetica Estonia
  • 3. Privacy Threats John Doe checked-in at 8:33 AM and had a surgery from 8:35 AM to 9:25 AM. On average, the activity "surgery" happens between 8 AM and 10 AM with an execution time between 30 minutes and 2 hours.
  • 5. Motivation • Masking (pseudonymization) is not enough. • An attacker can use prefixes or suffixes of the John’s trace to identify him. • Also, the attacker can use the event timestamp to identify John.
  • 6. Attack Model The attacker has a goal ℎ 𝐿 that captures their interest in an event log 𝐿. We specifically consider the following attacker’s goals: • ℎ1: Has the individual been through a specific sub-trace (prefix or suffix)? The output is a bit with a value ϵ 0, 1 that represents yes or no. • ℎ2 : What is the execution time of a particular activity that has been executed for the individual? The output is a real value to be guessed with precision.
  • 7. Motivation • We do not want the attacker to guess John Doe after releasing the event log. • We do not want that guessing probability to increase by a certain amount after the event log release. • We use this guessing advantage probability (δ) to anonymize event logs.
  • 8. Problem Statement • Given an event log L, and given a maximum level of acceptable guessing advantage δ , generate an anonymized event log L' such that the probability of singling out an individual after publishing L' does not increase by more than δ .
  • 11. Approach – DAFSA • Group events that go through the same prefixes/suffixes. • Minimal Grouping shared prefixes/suffixes. • Lossless representation of event log. • Oversampling size estimation. • Timestamp anonymization Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 12. Approach – Event Log Annotation Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 13. Approach – ε estimation • The effect of ε differs based on the range of values • Two ε values: • ε for trace variants anonymization. • ε for timestamp anonymization. • We estimate different ε for each event based on the distribution of values. Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 14. ε estimation – Personalized Differential Privacy Input: δ=0.2 Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 15. ε estimation – Personalized Differential Privacy ε= 0.81 for δ=0.2 Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 16. Approach - Oversampling • The set of case variants is the main input used by process mining techniques, e.g., conformance checking. • In this setting, having the same set of case variants is critical. • Adding new case variants increases the false positives. • Removing existing case variants increases the false negatives. • We adopt Oversampling over the DAFSA transition to prevent singling out an individual using their prefix/suffix. Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 17. Approach - Oversampling Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 18. Approach – Time Noise Injection Annotate Event Log with DAFSA states ε estimation for relative time ε estimation for trace variants Oversampling Cases Noise Injection to Timestamps
  • 20. Empirical Evaluation We selected 14 publicly available event logs. What is the effect of choosing a privacy level δ on the time dilation of the anonymized event log? What is the effect of choosing a privacy level σ on the case variant distribution of the anonymized event log?
  • 23. • We proposed a concept of differentially private event log and a mechanism to compute such logs. • A differentially private event log limits the increase in the probability that an attacker may single out an individual based on the prefixes or suffixes of the traces in the log and the timestamps of each event. Conclusion and Future Work
  • 24. Conclusion and Future Work A limitation is that the proposed method introduces high levels of noise in the presence of unique traces or temporal outliers. We plan to investigate an approach where high-risk traces are suppressed so that the amount of injected noise into the remaining traces is lower. A second future research avenue is to consider anonymizing other columns in the event log, e.g., resources.