Process mining is a family of techniques for analyzing business
processes based on event logs extracted from information systems. Mainstream
process mining tools are designed for intra-organizational settings,
insofar as they assume that an event log is available for processing as a
whole. The use of such tools for inter-organizational process analysis is
hampered by the fact that such processes involve independent parties who
are unwilling to or sometimes legally prevented from, sharing detailed
event logs with each other. In this setting, this video proposes an approach
for constructing and querying a common artefact used for process mining,
namely the frequency and time-annotated Directly-Follows Graph (DFG),
over multiple event logs belonging to different parties, in such a way that
the parties do not share the event logs with each other. The proposal
leverages an existing platform for secure multi-party computation, namely
Sharemind. Since a direct implementation of DFG construction in Sharemind
suffers from scalability issues, we propose to rely on vectorization
of event logs and to employ a divide-and-conquer scheme for parallel
processing of sub-logs. The video reports on experiments that evaluate
the scalability of the approach on real-life logs.
The source code, installation guide and demo examples could be found on the GitHub repository:
https://github.com/Elkoumy/shareprom
The paper is available on :
https://link.springer.com/chapter/10.1007/978-3-030-49418-6_11
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
Secure multi party computation for inter-organizational process mining
1. Secure Multi-Party
Computation for Inter-
Organizational Process Mining
Gamal Elkoumy1,
Stephan A. Fahrenkrog-Petersen2, Marlon Dumas1,
Peeter Laud3, Alisa Pankova3, and Matthias Weidlich2
1University of Tartu, Tartu, Estonia
2Humboldt-Universität zu Berlin, Berlin, Germany
3Cybernetica, Tartu, Estonia
gamal.elkoumy@ut.ee
1
4. Multi-Party Computation based Process Mining
Airport
Airline Company
Process Mining
Compute
Node
Compute
Node
Compute
Node
Query
Engine
Secret Shares Secret Shares
4
5. Privacy-Preserving Process Mining
• Pika et al 2019 discussed the necessity of privacy-preserving process mining, due
to legal developments such as the GDPR.
• Existing techniques adopted anonymization of the event data to achieve privacy
preserving process mining such as algorithms have been published using k-anonymity
and t-closeness (Fahrenkrog-Petersen et al 2019 and Sweeney et al 2002).
• Other techniques incorporates privacy consideration in process mining techniques.
• Tools like ELPaaS (Bauer et al 2019) have been presented.
• Tillem et al 2017 discussed the inter-organizational settings using encryption, but
they assumed the existence of a trusted third party.
5
6. Inter-Organizational Process Mining
• Approaches for the automated discovery of process models in an inter-
organizational setting have been considered without addressing privacy
concerns [Schulz et al 2004,and Zeng et al 2013].
• Techniques to compare executions of the same process across multiple
organizations have been presented without considering privacy requirements
[Buijs et al. 2011, and Aksu et al. 2016].
• Liu et al. 2019 proposed a privacy-preserving inter-organizational process
mining, with the assumption of sharing confidential information with a trusted
third party.
6
7. Secure Multi-Party
Computation (MPC)
• Secure Multi-Party Computation is a
cryptographic functionality that allows n
parties to cooperatively evaluate a
function with no party or an allowed
coalition parties learning nothing besides
their own inputs and outputs.
https://sunfish-platform-documentation.readthedocs.io/en/latest/smc.html
7
8. Secure Multi-Party
Computation (MPC)
• Homomorphic secret sharing [Shamir et
al. 1979] is a common basis for MPC
protocols.
• In such protocols, the arithmetic or
Boolean circuit representing the
functionality is evaluated gate-by-gate,
constructing secret-shared outputs of
gates from their secret-shared inputs.
https://sunfish-platform-documentation.readthedocs.io/en/latest/smc.html
8
9. MPC based Process Mining
• In this paper, we build on top of
Sharemind (Bogdanov et al 2008),
whose main protocol set is based on
secret-sharing.
• The Sharemind framework provides
its own programming language,
namely the SecreC language.
9
10. Security Model
• In this paper, we use three-party MPC protocol set of
Sharemind that is secure against honest-but-curios
adversaries.
• Which means that as long as the parties are following
the protocols honestly and don’t collude, none of
them will learn more than the size of the data.
• We assume that input parties are sharing with each
other the number of activities and the maximum
trace length in their event logs. This is needed to do
preprocessing.
10
11. Security Model
• Even with encrypted data, contextual knowledge
might lead to leakage of some data (Majid et al
2018):
• An adversarial party might learn the shortest or
the longest trace and with the domain
experience they can reveal the actual activities.
• For such a case we are performing padding to the
logs, so the logs will have all the traces with the
same length, which is the maximum trace length.
https://www.pngitem.com/middle/ioRihxT_cyber-attack-clipart-hd-png-download/
11
12. Security Model
• Even with encrypted data, contextual knowledge
might lead to leakage of some data:
• A leakage might happen due to frequent
pattern mining or any access pattern attacks.
• To prevent such an attack, we use privacy-
preserving quicksort algorithm (Hamada et al
2012) We also use one-hot encoding and
privacy-preserving outer-product to update
the DFG Matrix (Laud et al 2017)
https://www.pngitem.com/middle/ioRihxT_cyber-attack-clipart-hd-png-download/
12
13. Model for Inter-Organizational Process Mining
Airport Event Log
Case
ID
Activity Time stamp
1 Check-In of
Passengers
02/01/20xx 12:31:57
1 Security Check 02/01/20xx 13:02:09
1 Boarding 02/01/20xx 14:22:45
2 Check-In of
Passengers
02/01/20xx 12:55:43
2 Processing
Luggage
02/01/20xx 14:21:56
Airline Event Log
Case
ID
Activity Time stamp
1 Close Doors 02/01/20xx 15:00:00
1 Aircraft ready for
take-off
02/01/20xx 15:07:00
3 Calculate Fuel
demand
03/01/20xx 10:24:45
3 Preparing Aircraft 03/01/20xx 12:44:23
3 Welcome
Passengers
03/01/20xx 13:32:12
13
17. Source Code
A GitHub repository with the source
code, installation steps and example
event logs can be found on:
https://github.com/Elkoumy/shareprom
17
18. Research Questions
• RQ1: How do the characteristics of the input event logs influence the
performance of the secure multi-party computation of the DFG?
• RQ2: What is the effect of increasing the number of parallel chunks on
the performance of the multi-party computation of the DFG?
18
19. Event Logs
Event Log # Events # Cases # Activities # Events per Case
Avg Max Min
BPIC 2013 6,660 1,432 6 4,478 35 1
Credit Requirement 50,525 10,034 8 15 15 15
Traffic Fines 561,470 150,370 11 3.73 20 2
19
23. Threats to validity
• The evaluation has the following limitations:
• The event logs used in the evaluation are intra-organizational event logs,
which we have split into separate logs to simulate the inter-organizational
setting.
• It is possible that they don’t capture the communication pattern of inter-
organizational processes.
• The number of event logs is reduced, which limits the generalizability of the
conclusions.
• The proposed technique can handle small-to-medium-sized logs, with
relatively short traces.
23
24. Conclusion
• This paper introduced a framework that enables two or more parties to
perform basic process mining operations over their partial logs of an inter-
organizational process held by each party.
• The framework only reveals the output of the queries that the parties opt
to disclose and three high-level log statistics; the number of traces per log,
the number of event types and the maximum trace length.
• An evaluation using real world event logs shows that it is possible to
compute the DFG with execution times that make this technique usable in
practice.
24
25. Future Work
• In future work, we will combine the proposed approach with differential
privacy approaches to noisify the DFG and the outputs from the framework.
• Another avenue for future work is to apply the framework to the problem of
business process management benchmarks, where an organization is
interested in knowing their performance in comparison to the industry
standards and other performers.
25