Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PhD Day 2011

766 views

Published on

Slides of my presentation at our faculty's PhD Day, 24 May 2011, Gent, B

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

PhD Day 2011

  1. 1. Merging Log Files for Process Mining<br />Jan Claes<br />Promotor: Prof. Dr. Geert Poels<br />Copromotor: Prof. Dr. Ir. Birger Raa<br />
  2. 2. Roadmap<br />3. Genetic algorithm<br />1. Process Mining<br />4. Experimentresults<br />2. Merging log files<br />5. Discussion<br />© http://maps.google.com<br />
  3. 3. 1. Process Mining<br />
  4. 4. Analyse the ‘black box’<br />A plane crashed... What happened?<br />
  5. 5. Analyse the ‘black box’: look for historical data<br />Process Mining: <br />Reconstruct and analyse processes<br />From historical process data<br />Log files<br />Audit trails<br />Database history fields/tables<br />A process failed... What happened?<br />
  6. 6. Process Mining<br />Audit trail, database fields, csv log file<br />ProM Analyses<br />ProM Log File<br />
  7. 7. 2. Merging log files<br />
  8. 8. Process Mining<br />Merging log files<br />
  9. 9. Process Mining<br />Log Merging<br />Merging log files<br />
  10. 10. 3. Genetic algorithm<br />
  11. 11. Genetic algorithm<br />cross-over<br />survival of the fittest<br />mutation<br />1st generation<br />2nd generation<br />3th generation<br />
  12. 12. Genetic algorithm<br />Fitness function score<br />14<br />18<br />18<br />cross-over<br />27<br />29<br />28<br />survival of the fittest<br />6<br />5<br />32<br />mutation<br />1st generation<br />2nd generation<br />3th generation<br />
  13. 13. 4. Experiment results<br />
  14. 14. Simulated data<br />Given model<br />Generate <br />random set of logs<br />single log (=solution)<br />Use merge algorithm to merge set of logs<br />Check resulting log with solution log<br />Experiment: proof of concept<br />
  15. 15. Experiment: proof of concept<br />Advantages of using simulated data<br />Solution is known<br />Controllable parameters (e.g. noise, overlap, matching id)<br />Disadvantages of using simulated data<br />Limited internal validity (are results realistic?)<br />No external validity (results not generalisable)<br />
  16. 16. Meantime<br />4 min<br />Results of version of 31/03/2011: GA inspired algorithm<br />Experiment results<br />
  17. 17. Meantime<br />4 min<br />Results of version of 31/03/2011: GA inspired algorithm<br />Experiment results<br />
  18. 18. Meantime<br />350 msec<br />Results of version of 15/05/2011: AIS algorithm<br />Experiment results: NEW ALGORITHM<br />
  19. 19. 5. Discussion<br />
  20. 20. Future work<br />Optimise genetic algorithm<br />Less incorrect links<br />Faster implementation<br />Fitness function factors<br />Validation with real test cases<br />Ghent University DPO (Human Resources)<br />Century21 (Real Estate)<br />BNP Paribas Fortis (Loan approvements)<br />...<br />
  21. 21. Jan Claes<br />jan.claes@ugent.be<br />http://processmining.ugent.be<br />FEB08, Tweekerkenstraat 2<br />9000 Gent, Belgium<br />Contact information<br />

×