Mining Project-Oriented Business
Processes
Saimir Bala, Cristina Cabanillas, Jan Mendling,
Andreas Rogge-Solti, Axel Polleres
Motivation
Imagine a train crashes because of an engineering error and
a lot of people get injured
You are a national railway system administrator, say ABC
You might be in trouble!
Mining Project-Oriented Business Processes Motivation 2 / 22
Agenda
Problem
Project-Oriented Business Processes
Approach
Conclusion
Mining Project-Oriented Business Processes Motivation 3 / 22
Agenda
Problem
Project-Oriented Business Processes
Approach
Conclusion
Mining Project-Oriented Business Processes Problem 4 / 22
Who is responsible?
Are you as ABC responsible for the accident?!
Show that your work complies with safety regulations
E.g. in the railway domain EN50128, EN50129, EN50126
Mining Project-Oriented Business Processes Problem 5 / 22
How to provide evidence of compliance?
Analyze the work in retrospect
The company does not use a BPM engine to execute their processes:
No process designed a priori
Rather a project that is handled ad-hoc by engineers
An expert (auditor) analyses the existing documentation and
manually checks if everything was done properly
Spreadsheets, wordprocessor, diagrams, version control system (VCS)
data
Mining Project-Oriented Business Processes Problem 6 / 22
Agenda
Problem
Project-Oriented Business Processes
Approach
Conclusion
Mining Project-Oriented Business Processes Project-Oriented Business Processes 7 / 22
Idea: mine project-oriented business pro-
cesses
Has the accident something to do with the software?
Mining Project-Oriented Business Processes Project-Oriented Business Processes 8 / 22
Idea: mine project-oriented business pro-
cesses
Has the accident something to do with the software?
Mining Project-Oriented Business Processes Project-Oriented Business Processes 8 / 22
Project-Oriented Business Processes
Classic business process Project-oriented business process
Engine No engine
Recursive, cyclic One time with fixed goals and resources
Many instances One prototype/product
Process model (e.g. BPMN) Plan (e.g. GANTT chart)
Activities Workpackages
Subprocesses Subworkpackages
Mining Project-Oriented Business Processes Project-Oriented Business Processes 9 / 22
Project-Oriented Business Processes
Classic business process Project-oriented business process
Engine No engine
Recursive, cyclic One time with fixed goals and resources
Many instances One prototype/product
Process model (e.g. BPMN) Plan (e.g. GANTT chart)
Activities Workpackages
Subprocesses Subworkpackages
Process mining
Mining Project-Oriented Business Processes Project-Oriented Business Processes 9 / 22
Project-Oriented Business Processes
Classic business process Project-oriented business process
Engine No engine
Recursive, cyclic One time with fixed goals and resources
Many instances One prototype/product
Process model (e.g. BPMN) Plan (e.g. GANTT chart)
Activities Workpackages
Subprocesses Subworkpackages
Process mining
Mining Project-Oriented Business Processes Project-Oriented Business Processes 9 / 22
State of the art: reduction to process min-
ing
Mining a process from software repositories (Kindler et al.,2006)
Mining Project-Oriented Business Processes Project-Oriented Business Processes 10 / 22
State of the art: visualization I
Dotted chart (Song & van der Aalst,2007)
Mining Project-Oriented Business Processes Project-Oriented Business Processes 11 / 22
State of the art: visualization II
Storylines (Ogawa & Ma, 2010)
Mining Project-Oriented Business Processes Project-Oriented Business Processes 12 / 22
Agenda
Problem
Project-Oriented Business Processes
Approach
Conclusion
Mining Project-Oriented Business Processes Approach 13 / 22
Mining VCS logs
Input: VCS logs (e.g. from Git, Subversion, etc)
Output: GANTT chart
Mining Project-Oriented Business Processes Approach 14 / 22
Challenges
Timing (how big is the activity in reality wrt to what we see in the
log?)
Aggregation (how can we aggregate events into activities? and how
can we see the project from a coarser grained point of view?)
Coverage (how efficiently was the time used?)
Mining Project-Oriented Business Processes Approach 15 / 22
Assumptions
Mining Project-Oriented Business Processes Approach 16 / 22
Assumptions
1. Meaningful tree structure
Mining Project-Oriented Business Processes Approach 16 / 22
Assumptions
1. Meaningful tree structure
2. Members perform local changes
Mining Project-Oriented Business Processes Approach 16 / 22
Assumptions
1. Meaningful tree structure
2. Members perform local changes
3. Systematic commits
Mining Project-Oriented Business Processes Approach 16 / 22
Visualization of a project
Aggregation (data from the SHAPE-project)
Time span Jan 2014 – Jan 2015
8 people
156 objects (files and directories)
226 commits, generating 453 events
Mining Project-Oriented Business Processes Approach 17 / 22
Correction of activity starting times
Adjustment and coverage
Mining Project-Oriented Business Processes Approach 18 / 22
Evaluation on open source projects
Log Duration Idle periods Files Commits ˆtc χ
File name Days Number Number Number Hours %
Our work 24 0 89 63 9 100
Whitehall 1279 6 6539 15566 2 95
Petitions 834 17 1562 914 13 59
Study 624 13 7501 736 11 58
The Guardian 1667 59 12889 621 30 44
Book 414 15 154 592 5 32
Papers 1859 55 1791 649 20 30
Requirements 771 22 505 231 17 21
Yelp 206 6 24 54 20 20
Adobe 1076 13 356 237 24 15
More real world logs on https://github.com/showcases
Mining Project-Oriented Business Processes Approach 19 / 22
Limitations and Future work
Limitations
Strong assumptions on the structure
The approach doesn’t take into account amount of documents
changes
Checking rules
Future work
Use statistic methods to improve the quality of the discovered projects
Discover the type of work/project by using comments written by users
User assessment of the quality of the discovered GANTT charts
Mining Project-Oriented Business Processes Approach 20 / 22
Agenda
Problem
Project-Oriented Business Processes
Approach
Conclusion
Mining Project-Oriented Business Processes Conclusion 21 / 22
Conclusion
We help the auditor to analyze the project
Different levels of abstraction (aggregation)
Time and resource of events
Work effort measure (coverage)
We used project VCS logs
Output as GANTT chart
Source code: https://github.com/s41m1r/MiningCVS
Email me: saimir.bala@wu.ac.at
Mining Project-Oriented Business Processes Conclusion 22 / 22
References
Kindler, E., Rubin, V. & Schäfer, W. (2006). Activity Mining for
Discovering Software Process Models. Software Engineering 79,
175–180.
Ogawa, M. & Ma, K.-L. (2010). Software evolution storylines. In
Proceedings of the 5th international symposium on Software
visualization (pp. 35–42).
Song, M. & van der Aalst, W. M. (2007). Supporting process mining
by showing events at a glance. In 7th Annual Workshop on
Information Technologies and Systems (pp. 139–145).
Baier, T., Mendling, J., & Weske, M. (2014). Bridging abstraction
layers in process mining. Information Systems, 46, 123-139.
Part I: AppendixMining Project-Oriented Business Processes References 1 / 4
Expected active time between commits
Expected active time between commits ^tc is given as follows.
(1) ^tc =
a∈Af
(ω(a) − α (a))
a∈Af
(c(a) − 1)
with
ω (a): End time of activity a
α’(a): Time of the first event of the activity a
c (a): Number of commits in activity a
Part I: AppendixMining Project-Oriented Business Processes Backup 2 / 4
Coverage factor
Definition (Coverage)
The coverage χ of work packages by activities is a function χ : W → [0, 1]
and is defined as follows.
(2) χ(w) =
a∈β−1(w) (ω(a) − α(a))
τ(w)
where τ is the duration of work package w.
Part I: AppendixMining Project-Oriented Business Processes Backup 3 / 4
Average idle time
Let nc be the number of commits per work package. We compute the
average idle time as follows.
(3) tIdle =
τ − nc ·^tc
n
, n > 0
where n is the number of idle times in the work package, and τ is the time
duration of the work package.
Part I: AppendixMining Project-Oriented Business Processes Backup 4 / 4

Mining Project-Oriented Business Processes

  • 1.
    Mining Project-Oriented Business Processes SaimirBala, Cristina Cabanillas, Jan Mendling, Andreas Rogge-Solti, Axel Polleres
  • 2.
    Motivation Imagine a traincrashes because of an engineering error and a lot of people get injured You are a national railway system administrator, say ABC You might be in trouble! Mining Project-Oriented Business Processes Motivation 2 / 22
  • 3.
    Agenda Problem Project-Oriented Business Processes Approach Conclusion MiningProject-Oriented Business Processes Motivation 3 / 22
  • 4.
  • 5.
    Who is responsible? Areyou as ABC responsible for the accident?! Show that your work complies with safety regulations E.g. in the railway domain EN50128, EN50129, EN50126 Mining Project-Oriented Business Processes Problem 5 / 22
  • 6.
    How to provideevidence of compliance? Analyze the work in retrospect The company does not use a BPM engine to execute their processes: No process designed a priori Rather a project that is handled ad-hoc by engineers An expert (auditor) analyses the existing documentation and manually checks if everything was done properly Spreadsheets, wordprocessor, diagrams, version control system (VCS) data Mining Project-Oriented Business Processes Problem 6 / 22
  • 7.
    Agenda Problem Project-Oriented Business Processes Approach Conclusion MiningProject-Oriented Business Processes Project-Oriented Business Processes 7 / 22
  • 8.
    Idea: mine project-orientedbusiness pro- cesses Has the accident something to do with the software? Mining Project-Oriented Business Processes Project-Oriented Business Processes 8 / 22
  • 9.
    Idea: mine project-orientedbusiness pro- cesses Has the accident something to do with the software? Mining Project-Oriented Business Processes Project-Oriented Business Processes 8 / 22
  • 10.
    Project-Oriented Business Processes Classicbusiness process Project-oriented business process Engine No engine Recursive, cyclic One time with fixed goals and resources Many instances One prototype/product Process model (e.g. BPMN) Plan (e.g. GANTT chart) Activities Workpackages Subprocesses Subworkpackages Mining Project-Oriented Business Processes Project-Oriented Business Processes 9 / 22
  • 11.
    Project-Oriented Business Processes Classicbusiness process Project-oriented business process Engine No engine Recursive, cyclic One time with fixed goals and resources Many instances One prototype/product Process model (e.g. BPMN) Plan (e.g. GANTT chart) Activities Workpackages Subprocesses Subworkpackages Process mining Mining Project-Oriented Business Processes Project-Oriented Business Processes 9 / 22
  • 12.
    Project-Oriented Business Processes Classicbusiness process Project-oriented business process Engine No engine Recursive, cyclic One time with fixed goals and resources Many instances One prototype/product Process model (e.g. BPMN) Plan (e.g. GANTT chart) Activities Workpackages Subprocesses Subworkpackages Process mining Mining Project-Oriented Business Processes Project-Oriented Business Processes 9 / 22
  • 13.
    State of theart: reduction to process min- ing Mining a process from software repositories (Kindler et al.,2006) Mining Project-Oriented Business Processes Project-Oriented Business Processes 10 / 22
  • 14.
    State of theart: visualization I Dotted chart (Song & van der Aalst,2007) Mining Project-Oriented Business Processes Project-Oriented Business Processes 11 / 22
  • 15.
    State of theart: visualization II Storylines (Ogawa & Ma, 2010) Mining Project-Oriented Business Processes Project-Oriented Business Processes 12 / 22
  • 16.
    Agenda Problem Project-Oriented Business Processes Approach Conclusion MiningProject-Oriented Business Processes Approach 13 / 22
  • 17.
    Mining VCS logs Input:VCS logs (e.g. from Git, Subversion, etc) Output: GANTT chart Mining Project-Oriented Business Processes Approach 14 / 22
  • 18.
    Challenges Timing (how bigis the activity in reality wrt to what we see in the log?) Aggregation (how can we aggregate events into activities? and how can we see the project from a coarser grained point of view?) Coverage (how efficiently was the time used?) Mining Project-Oriented Business Processes Approach 15 / 22
  • 19.
  • 20.
    Assumptions 1. Meaningful treestructure Mining Project-Oriented Business Processes Approach 16 / 22
  • 21.
    Assumptions 1. Meaningful treestructure 2. Members perform local changes Mining Project-Oriented Business Processes Approach 16 / 22
  • 22.
    Assumptions 1. Meaningful treestructure 2. Members perform local changes 3. Systematic commits Mining Project-Oriented Business Processes Approach 16 / 22
  • 23.
    Visualization of aproject Aggregation (data from the SHAPE-project) Time span Jan 2014 – Jan 2015 8 people 156 objects (files and directories) 226 commits, generating 453 events Mining Project-Oriented Business Processes Approach 17 / 22
  • 24.
    Correction of activitystarting times Adjustment and coverage Mining Project-Oriented Business Processes Approach 18 / 22
  • 25.
    Evaluation on opensource projects Log Duration Idle periods Files Commits ˆtc χ File name Days Number Number Number Hours % Our work 24 0 89 63 9 100 Whitehall 1279 6 6539 15566 2 95 Petitions 834 17 1562 914 13 59 Study 624 13 7501 736 11 58 The Guardian 1667 59 12889 621 30 44 Book 414 15 154 592 5 32 Papers 1859 55 1791 649 20 30 Requirements 771 22 505 231 17 21 Yelp 206 6 24 54 20 20 Adobe 1076 13 356 237 24 15 More real world logs on https://github.com/showcases Mining Project-Oriented Business Processes Approach 19 / 22
  • 26.
    Limitations and Futurework Limitations Strong assumptions on the structure The approach doesn’t take into account amount of documents changes Checking rules Future work Use statistic methods to improve the quality of the discovered projects Discover the type of work/project by using comments written by users User assessment of the quality of the discovered GANTT charts Mining Project-Oriented Business Processes Approach 20 / 22
  • 27.
    Agenda Problem Project-Oriented Business Processes Approach Conclusion MiningProject-Oriented Business Processes Conclusion 21 / 22
  • 28.
    Conclusion We help theauditor to analyze the project Different levels of abstraction (aggregation) Time and resource of events Work effort measure (coverage) We used project VCS logs Output as GANTT chart Source code: https://github.com/s41m1r/MiningCVS Email me: saimir.bala@wu.ac.at Mining Project-Oriented Business Processes Conclusion 22 / 22
  • 29.
    References Kindler, E., Rubin,V. & Schäfer, W. (2006). Activity Mining for Discovering Software Process Models. Software Engineering 79, 175–180. Ogawa, M. & Ma, K.-L. (2010). Software evolution storylines. In Proceedings of the 5th international symposium on Software visualization (pp. 35–42). Song, M. & van der Aalst, W. M. (2007). Supporting process mining by showing events at a glance. In 7th Annual Workshop on Information Technologies and Systems (pp. 139–145). Baier, T., Mendling, J., & Weske, M. (2014). Bridging abstraction layers in process mining. Information Systems, 46, 123-139. Part I: AppendixMining Project-Oriented Business Processes References 1 / 4
  • 30.
    Expected active timebetween commits Expected active time between commits ^tc is given as follows. (1) ^tc = a∈Af (ω(a) − α (a)) a∈Af (c(a) − 1) with ω (a): End time of activity a α’(a): Time of the first event of the activity a c (a): Number of commits in activity a Part I: AppendixMining Project-Oriented Business Processes Backup 2 / 4
  • 31.
    Coverage factor Definition (Coverage) Thecoverage χ of work packages by activities is a function χ : W → [0, 1] and is defined as follows. (2) χ(w) = a∈β−1(w) (ω(a) − α(a)) τ(w) where τ is the duration of work package w. Part I: AppendixMining Project-Oriented Business Processes Backup 3 / 4
  • 32.
    Average idle time Letnc be the number of commits per work package. We compute the average idle time as follows. (3) tIdle = τ − nc ·^tc n , n > 0 where n is the number of idle times in the work package, and τ is the time duration of the work package. Part I: AppendixMining Project-Oriented Business Processes Backup 4 / 4