Merging Computer Log Files for Process Mining:An Artificial Immune System Technique<br />Jan Claes and Geert Poels<br />ht...
Process Mining<br />Processes are supported by IT systems<br />IT systems record actual process data<br />Process data can...
Keynote BPI 2010, Michael Zur Muehlen<br />Process Controlling<br />Business Activity Monitoring<br />Process Intelligence...
Preparation<br />Collect data: find event information<br />Merge data: from different sources<br />Structure data: group p...
Merging log files<br />My research:Merging log files<br />
Merging log files<br />2. Merge chronologically<br />1. Find links<br />3. Add unlinked traces<br />4. Put in new log file...
Find links<br />Required properties of solution<br />Finds traces in both log files that belong to the same process execut...
Find links<br />Proposed solution<br />Take the best possible guess based on assumptions<br />Include multiple indicator f...
Decisions to make<br />Which indicator factors?<br />How to calculate a score for each factor?<br />How to combine factor ...
Indicator factors<br />Same trace identifier<br />Assumption: If both logs contain a trace with the same id, there is a ve...
Indicator factors<br />Equal attribute values<br />Assumption: The more attributes of a trace and its events from both log...
Indicator factors<br />Extra trace & Missing trace<br />Assumption: A trace from one log has more chance to match with onl...
Indicator factors<br />Time difference<br />Assumption: For a certain trace t in one log the trace in the other log that s...
User interaction<br /><ul><li>Step 1	let user adapt parameters & weights
Step 2	give feedback on individual scores:	user can change weights and restart</li></ul>? Step 3	present best solution per...
Test results<br />Simulated data (300-400 msec on standard laptop)<br />Benefit of controllable parameters, known solution...
Further research plans<br />Refining merging technique<br />Quest for optimal indicators and weights is continuous effort ...
Questions<br />Do you agree that combined set of logical assumptions can be strong indicator (stronger than individual ass...
Upcoming SlideShare
Loading in …5
×

BPI@BPM2011

1,424 views
1,417 views

Published on

Slides of my presentation at BPI workshop at BPM conference, 29 August 2011, Clermont-Ferrand, FR

Published in: Business, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,424
On SlideShare
0
From Embeds
0
Number of Embeds
763
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BPI@BPM2011

  1. 1. Merging Computer Log Files for Process Mining:An Artificial Immune System Technique<br />Jan Claes and Geert Poels<br />http://processmining.ugent.be<br />
  2. 2. Process Mining<br />Processes are supported by IT systems<br />IT systems record actual process data<br />Process data can be used to<br />Discover process model<br />Check conformance with existing process info<br />Improve or extend existing process model<br />Attention<br />Only As-Is<br />Only (correctly) recorded information<br />Process Mining<br />
  3. 3. Keynote BPI 2010, Michael Zur Muehlen<br />Process Controlling<br />Business Activity Monitoring<br />Process Intelligence<br />Event Detection & Correlation<br />Decision Making<br />Main focus point of current BPI research<br />Deserves more focus in BPI research<br />BPI 2010, Keynote Michael Zur Muehlen http://www.slideshare.net/mzurmuehlen/bu-5236080<br />
  4. 4. Preparation<br />Collect data: find event information<br />Merge data: from different sources<br />Structure data: group per instance<br />Convert data: to tool specific format<br />Process mining<br />Make decisions, take action<br />Process Mining steps<br />M<br />M<br />M<br />A<br />M<br />A<br />A<br />M<br />M<br />Manual task Analysts needed in most cases<br />A<br />Automated task Less human involvement needed<br />
  5. 5. Merging log files<br />My research:Merging log files<br />
  6. 6. Merging log files<br />2. Merge chronologically<br />1. Find links<br />3. Add unlinked traces<br />4. Put in new log file<br />
  7. 7. Find links<br />Required properties of solution<br />Finds traces in both log files that belong to the same process execution<br />Without prior knowledge about the provided log files (as generic as possible)<br />But with maximal possibilities for the (expert) user to include his knowledge about the log files<br />
  8. 8. Find links<br />Proposed solution<br />Take the best possible guess based on assumptions<br />Include multiple indicator factors in analysis<br />Calculate factor scores for each analysed solution<br />Combine factor scores into global score per solution<br />‘Best guess’ is solution with highest combined score,because based on assumed indicators, most indicator value points to this solution<br />Provide user interaction possibilities<br />
  9. 9. Decisions to make<br />Which indicator factors?<br />How to calculate a score for each factor?<br />How to combine factor scores to global score?<br />Which solutions to analyse?(analyse = calculate & compare scores)<br />Which user interactions to include (expert) user knowledge?<br />See paper for more details<br />
  10. 10. Indicator factors<br />Same trace identifier<br />Assumption: If both logs contain a trace with the same id, there is a very high chance they match<br />Not always though (e.g. customer id vs. order id)<br />16<br />10<br />17<br />12<br />18<br />14<br />19<br />16<br />20<br />18<br />21<br />20<br />
  11. 11. Indicator factors<br />Equal attribute values<br />Assumption: The more attributes of a trace and its events from both logs are equal, the higher the chance they match<br />16<br />JAN 12:00<br />JC 14 14:00<br />17<br />17<br />JAN 12:10<br />JC 15 14:10<br />18<br />18<br />JAN 12:20<br />JC 16 14:20<br />19<br />19<br />JAN 12:30<br />JC 17 14:30<br />1A<br />20<br />JAN 12:40<br />JC 18 14:40<br />1B<br />21<br />JAN 12:50<br />JC 19 14:50<br />1C<br />
  12. 12. Indicator factors<br />Extra trace & Missing trace<br />Assumption: A trace from one log has more chance to match with only one trace from the other log<br />Extra trace: Negative if trace is linked with multiple traces in other log<br />Missing trace: Negative if trace is not linked<br />
  13. 13. Indicator factors<br />Time difference<br />Assumption: For a certain trace t in one log the trace in the other log that starts sooner after t has a higher chance to match<br />More difficult when traces overlap<br />16<br />17<br />17<br />JAN 12:00<br />JC 10 11:45<br />18<br />18<br />JAN 12:10<br />JC 11 11:55<br />19<br />19<br />JAN 12:20<br />JC 12 12:05<br />1A<br />20<br />JAN 12:30<br />JC 13 12:15<br />1B<br />JAN 12:40<br />JC 14 12:25<br />21<br />1C<br />JAN 12:50<br />JC 15 12:35<br />
  14. 14. User interaction<br /><ul><li>Step 1 let user adapt parameters & weights
  15. 15. Step 2 give feedback on individual scores: user can change weights and restart</li></ul>? Step 3 present best solution per factor: let user choose which factor dominates based on factor score feedback<br />? Step 4 provide other ways for user to feed algorithm with his insights<br />
  16. 16. Test results<br />Simulated data (300-400 msec on standard laptop)<br />Benefit of controllable parameters, known solution<br />Correct number of linked traces in all tests<br />Perfect results for same trace id and up to 50% noise, worse results for higher overlap of traces<br /><ul><li>Real data (6-10 min on standard laptop)</li></ul>Correct number of linked traces in all tests<br />Almost perfect results for same trace id and up to 50% noise, worse results for higher overlap<br />
  17. 17. Further research plans<br />Refining merging technique<br />Quest for optimal indicators and weights is continuous effort (based on experiences from case studies)<br />Implementation optimisation (speed, memory usage, scalability) is continuous effort<br />Validation (case studies)<br />
  18. 18. Questions<br />Do you agree that combined set of logical assumptions can be strong indicator (stronger than individual assumptions)?<br />Any feedback on the used factors?<br />Any other factors that should be included?<br />Any concerns about performance and scalability?<br />
  19. 19. Contact information<br />Jan Claes<br />jan.claes@ugent.be<br />http://processmining.ugent.be<br />Twitter: @janclaesbelgium<br />

×