Conformance checking belongs to the emerging field of Process Mining and it aims to compare event logs (reality) and process model to reveal desirable/underirable deviation. The potentiality of it has proved to be huge but the available available resources are limited. This project aims to obtain a comparison between the tools and the techniques applied in conformance checking in the healthcare context.
1. Conformance Checking
for Medical Training Process
Name: Giulia Alessandrelli
Supervisor: An Nguyen
Machine Learning and Data Analytics for Industry 4.0
Final Presentation
Machine Learning and Data Analytics (MaD) Lab
Friedrich-Alexander-Universitรคt Erlangen-Nรผrnberg (FAU)
July 10, 2019
2. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Summary
2
โข Introduction
โข Goal of the project
โข Exploratory Data Analysis
โข Methods
โข Demo
โข Results
โข Bibliography
4. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Context
4
Process Mining
Process
Discovery
Conformance
Checking
Predictive
Analytics
Business
Process
Management
Data
Science
5. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Conformance Checking
Definition
Certification or confirmation that a good, service, or conduct
meets the requirements of legislation, accepted practices,
prescribed rules and regulations, specified standards, or terms of
a contract.
Fields of Application:
โข Business Application
โข Auditing
5
Algorithm
Event Logs
Process Model
Global
conformance
measures and
local diagnostics
6. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Motivation
โข To improve the reality caption
โข To expose of undesirable deviation
โข To reveal desirable deviation
6
Real
Process
Process
Model
Event
Data
Record
Process
Discovery
Process Discovery
Conformance Checking
Conformace
Checking
7. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Motivation
7
Powerfull
Basic
Business
User
Specialist /
Engineers
Process
Mining*
Excel
Report,
Dashbord
Hadoop,
R,
Python
*Fluxicon, Disco
8. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
State of Art
8
7
136
Publications in the Process Mining field
Process Minin in Primary Care
Process Mining
68
31
22
8
Publications according to geographical
area
Europa North America Asia South America
โข Originally applied to business processes over 20 years ago
โข Has recentely applied to healthcare because:
โข It can reveals insights into clinical care pathways
โข It can inform the redesign of healthcare service
โข No significant literature about conformance checking applied to
medical training process
9. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
State of Art
9
33
13
11
10
866
61
Diffusion according to medical field
Oncology Cardiology Emergency care Stroke
Surgery Diabetes Asthma Others
11. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Preliminaries
Event Logs
Collection of events recorded from a process mining activity.
They may be stored in different storage format but they
should have good quality.
Process Model
โข Descriptive Model ๏ It shows if reality caption needs to
be improved
โข Normative Model ๏ It exposes undesirable deviation or it
reveal desiderable deviation
11
Algorithm
Event Logs
Process Model
Global
conformance
measure
12. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Where the data are coming from?
12
Central
Venous
Catheter
Installation
10 students
Event
Logs
13 experts
Delphi
Panel
Clinical
consensus
for the CVC
procedure
Model
13. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Question
What are the commonalities and discrepancies between the
modeled behavior and the observed behaviour?
13
โข What is the performance of the students?
โข How does the studentโs performance change between a first
pre test and a final post test?
โข What can instructors learn from an aggregated analyis of the
whole course?
Comparison between the tools and the techniques applied in
conformance checking
15. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Data โ Model in Petri Net notation
15
โข Itโs a directed bipartite graph
โข Nodes represent transitions and places
โข Arcs are connecting places to transition and
transistion to places, and have an associated weight
16. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Data โ Event Logs
16
โข 20 traces (2 traces for every student)
โข 1394 events
โข 29 event classes (activities)
18. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Overview
Tools:
โข ProM
โข Python (PM4Py)
Methods applied in conformance checking
โข Token-Based Replay
โข Alignment-Based Replay
Main parameter used in conformance checking
โข Fitness: ability to explain observed behavior
18
19. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Token-Based Algorithm
19
โข It aims to match a trace and a Petri net model to
discover
โข which transitions are executed
โข in which places we have a remining or missing
tokens for the given process instance
Example:
Transition (activities)
Place
20. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Token-Based Algorithm
โข Itโs based on counting the number of produced,
consumed, missing and remaining tokens
โข A trace is fitting according to the model if, during its
execution, the transitions can be fired without the
need to insert any missing token
20
Produced Token Consumed Token
21. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Token-Based Algorithm
โข Another rule to properly count tokens:
โข In the beginning a token is produced for the source place
p=1
โข At the end a token is consumed for the sink place cโ=c+1
โข A rule to check the counting:
โข At any time: p+m โฅ c โฅ m
โข At the end: r = p + m - c
21
p c
p c
rm
22. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Token-Based Algorithm Example
22
Produced Tokens 6
Consumed Tokens 6
Missing Tokens 1
Remaining Tokens 1
Given the trace: { a, b, e g }
Given the model:
m=1r=1
23. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Alignment-based Replay
It aims to provide the closest matching path through the
process model for any trace in the event log.
For each trace, the output of an alignment is a list of
couples:
โข First element is an event (of the trace)
โข Second element is a transition (of the model)
23
a b >> d e g
a >> c d e g
From the event log
From the model
24. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Alignment-based Replay
For each couple, the following classification could be provided:
โข Sync move: both the trace and the model advance in the
same way during the replay
โข Move on log: there is a replay move in the trace that is not
mimicked in the model
โข Move on model: there is a replay move in the model that is
not mimicked in the trace.
24
a b >> d e g
a >> c d e g
Move in
log only
Move in
model only
Sync move
25. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Alignment-based Replay Example
25
a e
c
b
d
h
g
a >> d e g h
a b d e g >>
Move in
log only
Move in
model only
Sync move
# Trace
1 adegh
โฆ โฆ
โฆ โฆ
28. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Token-Based Replay by definition
28
29. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Token-Based Replay (PM4Py)
29
30. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Alignment-Based Replay (PM4Py)
30
31. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Alignment-Based Replay (ProM)
31
โข Log are decomposed into sublogs
โข Net are decomposed into subnets
โข Every sublog will be replayed on the corresponding
subnet
โข The resulting subalignment will be merged into a
single pseudo alignment
32. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Demo (ProM)
33. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Conclusion
33
CONS PROS
According to the algorithm and
the software used, the output
changes a lot
It can improve reality caption,
expose undesirable deviations
and reveal desirable ones
Difficulty to obtain valid
quantitative measures
Potential
Need to customize the
algorithm according to the
process
Recently applied outside the business field
34. 10.07.2019 | Giulia Alessandrelli | MaD Lab | Machine Learning and Data Analytics for Industry 4.0
Bibliography
โข Munoz-Gama, J.; De La Fuente, R.; Sepรบlveda, M.;Fuentes, R.
Conformance Checking Challenge 2019
4TU.Centre for Research Data, 2019
โข Richard, W.; Eric, R.; Niels, P. & Johnson Owen, A.
Process Mining in Primary Care: A Literature Review
Studies in Health Technology and Informatics, IOS Press, 2018, 247,
376-380
โข Van Der Aalst, W.
Process Mining: Data Science in Action
Springer Berlin Heidelberg, 2016
โข Documentation at http://pm4py.org/
34