Advertisement
Advertisement

More Related Content

Advertisement

More from Wil van der Aalst(20)

Advertisement

Process Mining - Chapter 4 - Getting the Data

  1. Chapter 4 Getting the Data prof.dr.ir. Wil van der Aalst www.processmining.org
  2. Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Chapter 3 Process Modeling and Data Mining Analysis Part II: From Event Logs to Process Models Chapter 4 Chapter 5 Chapter 6 Getting the Data Process Discovery: An Advanced Process Introduction Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Chapter 8 Chapter 9 Conformance Mining Additional Operational Support Checking Perspectives Part IV: Putting Process Mining to Work Chapter 10 Chapter 11 Chapter 12 Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes” Part V: Reflection Chapter 13 Chapter 14 Cartography and Epilogue Navigation PAGE 1
  3. Goal of process mining • What really happened in the past? • Why did it happen? • What is likely to happen in the future? • When and why do organizations and people deviate? • How to control a process better? • How to redesign a process to improve its performance? PAGE 2
  4. Getting the data supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 3
  5. From heterogeneous data sources to process mining results Extract, Transform, and Load (ELT) optional data source ELT data ELT warehouse data source ELT data coarse-grained source scoping data source extract XES, MXML, or data similar source unfiltered event logs process mining discovery conformance enhancement filter filtered event logs (process) models answers fine-grained scoping PAGE 4
  6. Example log • A process consists of cases. • A case consists of events such that each event relates to precisely one case. • Events within a case are ordered. • Events can have attributes. • Examples of typical attribute names are activity, time, costs, and resource. PAGE 5
  7. process cases events attributes activity= register request Another view 1 35654423 time = 30-12-2010:11.02 resource = Pete costs = 50 35654424 ... ... activity= reject request time = 07-01-2011:14.24 35654427 resource = Pete costs = 200 activity= register request time = 30-12-2010:11.32 35654483 resource = Mike 2 costs = 50 35654485 ... ... activity= pay compensation time = 08-01-2011:12.05 35654489 resource = Ellen costs = 200 activity= register request time = 30-12-2010:11.32 35654521 resource = Pete 3 costs = 50 35654522 ... ... activity= pay compensation time = 15-01-2011:10.45 35654533 resource = Ellen costs = 200 ... ... ...PAGE 6
  8. Standard transactional life-cycle model suspend schedule assign reassign start resume manualskip abort_activity complete autoskip withdraw abort_case successful unsuccessful termination termination PAGE 7
  9. Five activity instances schedule assign start complete start complete a: d: schedule assign reassign start suspend resume complete b: complete start suspend resume suspend abort_activity c: e: suspend schedule assign reassign start resume manualskip abort_activity complete autoskip withdraw abort_case successful unsuccessful PAGE 8 termination termination
  10. Overlapping activity instances start start complete complete start start complete complete a: a: 6 hours 2 hours 5 hours 9 hours suspend schedule assign reassign start resume manualskip abort_activity complete autoskip withdraw abort_case successful unsuccessful PAGE 9 termination termination
  11. Using attributes social network showing how work flows from one person to another Pete Sara Sue Mike performance indicators per activity Ellen Sean Activity b Frequency: 456 role Activity g Waiting time: 15.6 +/- 2.5 hours Frequency: 311 Service time: 1.2 +/- 0.5 hours E Waiting time: 12.4 +/- 2.1 hours Costs: 412 +/- 55 euros Service time: 0.5 +/- 0.2 hours b Costs: 198 +/- 35 euros A examine thoroughly A g A M c1 c3 pay c compensation a examine e A start register casually A decide c5 end request h c2 d c4 M reject Activity h check ticket request Frequency: 407 f reinitiate Waiting time: 7.4 +/- 1.8 hours request Service time: 1.1 +/- 0.3 hours control flow Costs: 209 +/- 38 euros PAGE 10
  12. XES (eXtensible Event Stream) • See www.xes-standard.org. • Adopted by the IEEE Task Force on Process Mining. • Predecessor: MXML and SA-MXML. • The format is supported by tools such as ProM (as of version 6), Nitro, XESame, and OpenXES. • ProMimport supports MXML. PAGE 11
  13. PAGE 12
  14. XES (compatible with MXML) Event log consists of: • traces (process instances) − events • Standard extensions: • concept (for naming) • lifecycle (for transactional properties) • org (for the organizational perspective) • time (for timestamps) • semantic (for ontology references) PAGE 13
  15. extensions loaded every trace has a name every event has a name and a transition start of trace (i.e. process instance) classifier = name + transition name of trace resource timestamp name of event transition (activity name) PAGE 14
  16. end of trace (i.e. process instance) start of trace name of trace resource timestamp name of event (activity name) data associated to event PAGE 15
  17. Challenges when extracting event logs • Correlation: Events in an event log are grouped per case. This simple requirement can be quite challenging as it requires event correlation, i.e., events need to be related to each other. • Timestamps: Events need to be ordered per case. Typical problems: only dates, different clocks, delayed logging. • Snapshots. Cases may have a lifetime extending beyond the recorded period, e.g., a case was started before the beginning of the event log. • Scoping. How to decide which tables to incorporate? • Granularity: the events in the event log are at a different level of granularity than the activities relevant for end users. PAGE 16
  18. Flattening reality into event logs Orderline Order OrderLineID : OrderLineID 1 1..* OrderID : OrderID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt Delivery 0..* 1 DelID : DelID DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo PAGE 17
  19. Tables Orderline Order OrderLineID : OrderLineID 1 1..* OrderID : OrderID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt Delivery 0..* 1 DelID : DelID DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo PAGE 18
  20. Order:91245 Order instance Case id: 91245 Case id: 91245 Case id: 91245 Activity: create order Activity: pay order Activity: complete order Timestamp: 28-11-2011:08.12 Timestamp: 02-12-2011:13.45 Timestamp: 05-12-2011:11.33 Customer: John Customer: John Customer: John Amount: 100 Amount: 100 Amount: 100 OrderLine:112345 Case id: 91245 Case id: 91245 Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.13 Timestamp: 28-11-2011:08.55 Orderline OrderLineID: 112345 OrderLineID: 112345 Product: iPhone 4G Product: iPhone 4G Order OrderLineID : OrderLineID NofItems: 1 TotalWeight: 0.250 NofItems: 1 TotalWeight: 0.250 1 1..* DellID: 882345 DellID: 882345 OrderID : OrderID OrderID : OrderID Customer : CustID Product : ProdID OrderLine:112346 Amount : Euro NofItems : PosInt Case id: 91245 Case id: 91245 Case id: 91245 Activity: enter order line Activity: create backorder Activity: secure order line Created : DateTime TotalWeight : Weight Timestamp: 28-11-2011:08.14 Timestamp: 28-11-2011:08.55 Timestamp: 30-11-2011:09.06 OrderLineID: 112346 OrderLineID: 112346 OrderLineID: 112346 Product: iPod nano Product: iPod nano Product: iPod nano Paid : DateTime Entered : DateTime NofItems: 2 NofItems: 2 NofItems: 2 TotalWeight: 0.300 TotalWeight: 0.300 TotalWeight: 0.300 DellID: 882346 DellID: 882346 DellID: 882346 Completed : DateTime BackOrdered : DateTime Secured : DateTime OrderLine:112347 DelID : DelID Case id: 91245 Case id: 91245 1..* Activity: enter order line Timestamp: 28-11-2011:08.15 Activity: secure order line Timestamp: 29-11-2011:10.06 OrderLineID: 112347 OrderLineID: 112347 Product: iPod classic Product: iPod classic NofItems: 1 NofItems: 1 0..1 TotalWeight: 0.200 DellID: 882345 TotalWeight: 0.200 DellID: 882345 Attempt Delivery Delivery:882345 0..* 1 DelID : DelID DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 91245 Case id: 91245 Case id: 91245 Activity: delivery attempt Activity: delivery attempt Activity: delivery attempt Timestamp: 05-12-2011:08.55 Timestamp: 06-12-2011:09.12 Timestamp: 07-12-2011:08.56 DellID: 882345 DellID: 882345 DellID: 882345 Successful: false Successful: false Successful: true DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a Contact: 0497-2553660 Contact: 0497-2553660 Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Case id: 91245 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 PAGE 19 Contact: 040-2298761
  21. Order:91245 Case id: 91245 Case id: 91245 Case id: 91245 Activity: create order Activity: pay order Activity: complete order Timestamp: 28-11-2011:08.12 Timestamp: 02-12-2011:13.45 Timestamp: 05-12-2011:11.33 Customer: John Customer: John Customer: John Amount: 100 Amount: 100 Amount: 100 OrderLine:112345 Case id: 91245 Case id: 91245 Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.13 Timestamp: 28-11-2011:08.55 OrderLineID: 112345 OrderLineID: 112345 Product: iPhone 4G Product: iPhone 4G NofItems: 1 NofItems: 1 TotalWeight: 0.250 TotalWeight: 0.250 DellID: 882345 DellID: 882345 OrderLine:112346 Case id: 91245 Case id: 91245 Case id: 91245 Activity: enter order line Activity: create backorder Activity: secure order line Timestamp: 28-11-2011:08.14 Timestamp: 28-11-2011:08.55 Timestamp: 30-11-2011:09.06 OrderLineID: 112346 OrderLineID: 112346 OrderLineID: 112346 Product: iPod nano Product: iPod nano Product: iPod nano NofItems: 2 NofItems: 2 NofItems: 2 TotalWeight: 0.300 TotalWeight: 0.300 TotalWeight: 0.300 DellID: 882346 DellID: 882346 DellID: 882346 OrderLine:112347 Case id: 91245 Case id: 91245 Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.15 Timestamp: 29-11-2011:10.06 OrderLineID: 112347 OrderLineID: 112347 Product: iPod classic Product: iPod classic NofItems: 1 NofItems: 1 TotalWeight: 0.200 TotalWeight: 0.200 DellID: 882345 DellID: 882345 Delivery:882345 Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 91245 Case id: 91245 Case id: 91245 Activity: delivery attempt Activity: delivery attempt Activity: delivery attempt Timestamp: 05-12-2011:08.55 Timestamp: 06-12-2011:09.12 Timestamp: 07-12-2011:08.56 DellID: 882345 DellID: 882345 DellID: 882345 Successful: false Successful: false Successful: true DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a Contact: 0497-2553660 Contact: 0497-2553660 Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Case id: 91245 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 Contact: 040-2298761 PAGE 20
  22. Orderline Order OrderLineID : OrderLineID 1 1..* OrderID : OrderID OrderID : OrderID Orderline instance Customer : CustID Amount : Euro Product : ProdID NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 OrderLine:112345 Attempt Delivery 0..* 1 Case id: 112345 Case id: 112345 DelID : DelID DelID : DelID Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.13 Timestamp: 28-11-2011:08.55 When : DateTime DelAddress : Address OrderLineID: 112345 OrderLineID: 112345 Product: iPhone 4G Product: iPhone 4G Successful : Bool Contact : PhoneNo NofItems: 1 NofItems: 1 TotalWeight: 0.250 TotalWeight: 0.250 DellID: 882345 DellID: 882345 Order:91245 Case id: 112345 Case id: 112345 Case id: 112345 Activity: create order Activity: pay order Activity: complete order Timestamp: 28-11-2011:08.12 Timestamp: 02-12-2011:13.45 Timestamp: 05-12-2011:11.33 Customer: John Customer: John Customer: John Amount: 100 Amount: 100 Amount: 100 Delivery:882345 Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 112345 Case id: 112345 Case id: 112345 Activity: delivery attempt Activity: delivery attempt Activity: delivery attempt Timestamp: 05-12-2011:08.55 Timestamp: 06-12-2011:09.12 Timestamp: 07-12-2011:08.56 DellID: 882345 DellID: 882345 DellID: 882345 Successful: false Successful: false Successful: true DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a PAGE 21 Contact: 0497-2553660 Contact: 0497-2553660 Contact: 0497-2553660
  23. Other examples • The life cycles of reviewers, authors, papers, reviews, PC chairs, etc. • The life cycles of job applications and vacancies. • X-ray machine logs: machine, machine day, patient, treatment, routine, etc.? • Therefore, the selection and scoping of instances is needed. • Like making deciding on the elements to be put on map; there may be many maps covering partially overlapping areas. PAGE 22
  24. Extracting event logs • Not just a syntactical issue. • Different views are possible. • Important: − Selecting the right instance notion. − Ordering of events. − Selection of events. • Proclets: the true fabric of real-life processes. PAGE 23
Advertisement