Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Process Mining - Chapter 4 - Getting the Data

1,406 views

Published on

Slides supporting the book "Process Mining: Discovery, Conformance, and Enhancement of Business Processes" by Wil van der Aalst. See also http://springer.com/978-3-642-19344-6 (ISBN 978-3-642-19344-6) and the website http://www.processmining.org/book/start providing sample logs.

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Process Mining - Chapter 4 - Getting the Data

  1. 1. Chapter 4Getting the Dataprof.dr.ir. Wil van der Aalstwww.processmining.org
  2. 2. OverviewChapter 1IntroductionPart I: PreliminariesChapter 2 Chapter 3Process Modeling and Data MiningAnalysisPart II: From Event Logs to Process ModelsChapter 4 Chapter 5 Chapter 6Getting the Data Process Discovery: An Advanced Process Introduction Discovery TechniquesPart III: Beyond Process DiscoveryChapter 7 Chapter 8 Chapter 9Conformance Mining Additional Operational SupportChecking PerspectivesPart IV: Putting Process Mining to WorkChapter 10 Chapter 11 Chapter 12Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes”Part V: ReflectionChapter 13 Chapter 14Cartography and EpilogueNavigation PAGE 1
  3. 3. Goal of process mining• What really happened in the past?• Why did it happen?• What is likely to happen in the future?• When and why do organizations and people deviate?• How to control a process better?• How to redesign a process to improve its performance? PAGE 2
  4. 4. Getting the data supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 3
  5. 5. From heterogeneous data sources toprocess mining results Extract, Transform, and Load (ELT) optional data source ELT data ELT warehouse data source ELT data coarse-grained source scoping data source extract XES, MXML, or data similar source unfiltered event logs process mining discovery conformance enhancement filter filtered event logs (process) models answers fine-grained scoping PAGE 4
  6. 6. Example log• A process consists ofcases.• A case consists ofevents such that eachevent relates to preciselyone case.• Events within a case areordered.• Events can haveattributes.• Examples of typicalattribute names areactivity, time, costs, andresource. PAGE 5
  7. 7. process cases events attributes activity= register requestAnother view 1 35654423 time = 30-12-2010:11.02 resource = Pete costs = 50 35654424 ... ... activity= reject request time = 07-01-2011:14.24 35654427 resource = Pete costs = 200 activity= register request time = 30-12-2010:11.32 35654483 resource = Mike 2 costs = 50 35654485 ... ... activity= pay compensation time = 08-01-2011:12.05 35654489 resource = Ellen costs = 200 activity= register request time = 30-12-2010:11.32 35654521 resource = Pete 3 costs = 50 35654522 ... ... activity= pay compensation time = 15-01-2011:10.45 35654533 resource = Ellen costs = 200 ... ... ...PAGE 6
  8. 8. Standard transactional life-cycle model suspend schedule assign reassign start resume manualskip abort_activity complete autoskip withdraw abort_case successful unsuccessful termination termination PAGE 7
  9. 9. Five activity instances schedule assign start complete start completea: d: schedule assign reassign start suspend resume completeb: complete start suspend resume suspend abort_activityc: e: suspend schedule assign reassign start resume manualskip abort_activity complete autoskip withdraw abort_case successful unsuccessful PAGE 8 termination termination
  10. 10. Overlapping activity instances start start complete complete start start complete completea: a: 6 hours 2 hours 5 hours 9 hours suspend schedule assign reassign start resume manualskip abort_activity complete autoskip withdraw abort_case successful unsuccessful PAGE 9 termination termination
  11. 11. Using attributes social network showing how work flows from one person to another Pete Sara Sue Mikeperformance indicators per activity Ellen Sean Activity b Frequency: 456 role Activity g Waiting time: 15.6 +/- 2.5 hours Frequency: 311 Service time: 1.2 +/- 0.5 hours E Waiting time: 12.4 +/- 2.1 hours Costs: 412 +/- 55 euros Service time: 0.5 +/- 0.2 hours b Costs: 198 +/- 35 euros A examine thoroughly A g A M c1 c3 pay c compensation a examine e A start register casually A decide c5 end request h c2 d c4 M reject Activity h check ticket request Frequency: 407 f reinitiate Waiting time: 7.4 +/- 1.8 hours request Service time: 1.1 +/- 0.3 hours control flow Costs: 209 +/- 38 euros PAGE 10
  12. 12. XES (eXtensible Event Stream)• See www.xes-standard.org.• Adopted by the IEEE Task Force on Process Mining.• Predecessor: MXML and SA-MXML.• The format is supported by tools such as ProM (as of version 6), Nitro, XESame, and OpenXES.• ProMimport supports MXML. PAGE 11
  13. 13. PAGE 12
  14. 14. XES (compatible with MXML)Event log consists of: • traces (process instances) − events• Standard extensions: • concept (for naming) • lifecycle (for transactional properties) • org (for the organizational perspective) • time (for timestamps) • semantic (for ontology references) PAGE 13
  15. 15. extensions loaded every trace has a name every event has a name and a transitionstart of trace (i.e.process instance) classifier = name + transition name of trace resource timestamp name of eventtransition (activity name) PAGE 14
  16. 16. end of trace (i.e.process instance)start of trace name of trace resource timestamp name of event (activity name) data associated to event PAGE 15
  17. 17. Challenges when extracting event logs• Correlation: Events in an event log are grouped per case. This simple requirement can be quite challenging as it requires event correlation, i.e., events need to be related to each other.• Timestamps: Events need to be ordered per case. Typical problems: only dates, different clocks, delayed logging.• Snapshots. Cases may have a lifetime extending beyond the recorded period, e.g., a case was started before the beginning of the event log.• Scoping. How to decide which tables to incorporate?• Granularity: the events in the event log are at a different level of granularity than the activities relevant for end users. PAGE 16
  18. 18. Flattening reality into event logs Orderline Order OrderLineID : OrderLineID 1 1..* OrderID : OrderID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt Delivery 0..* 1 DelID : DelID DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo PAGE 17
  19. 19. Tables Orderline Order OrderLineID : OrderLineID 1 1..* OrderID : OrderID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTimeCompleted : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt Delivery 0..* 1 DelID : DelID DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo PAGE 18
  20. 20. Order:91245 Order instance Case id: 91245 Case id: 91245 Case id: 91245 Activity: create order Activity: pay order Activity: complete order Timestamp: 28-11-2011:08.12 Timestamp: 02-12-2011:13.45 Timestamp: 05-12-2011:11.33 Customer: John Customer: John Customer: John Amount: 100 Amount: 100 Amount: 100 OrderLine:112345 Case id: 91245 Case id: 91245 Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.13 Timestamp: 28-11-2011:08.55 Orderline OrderLineID: 112345 OrderLineID: 112345 Product: iPhone 4G Product: iPhone 4G Order OrderLineID : OrderLineID NofItems: 1 TotalWeight: 0.250 NofItems: 1 TotalWeight: 0.250 1 1..* DellID: 882345 DellID: 882345 OrderID : OrderID OrderID : OrderID Customer : CustID Product : ProdID OrderLine:112346 Amount : Euro NofItems : PosInt Case id: 91245 Case id: 91245 Case id: 91245 Activity: enter order line Activity: create backorder Activity: secure order line Created : DateTime TotalWeight : Weight Timestamp: 28-11-2011:08.14 Timestamp: 28-11-2011:08.55 Timestamp: 30-11-2011:09.06 OrderLineID: 112346 OrderLineID: 112346 OrderLineID: 112346 Product: iPod nano Product: iPod nano Product: iPod nano Paid : DateTime Entered : DateTime NofItems: 2 NofItems: 2 NofItems: 2 TotalWeight: 0.300 TotalWeight: 0.300 TotalWeight: 0.300 DellID: 882346 DellID: 882346 DellID: 882346Completed : DateTime BackOrdered : DateTime Secured : DateTime OrderLine:112347 DelID : DelID Case id: 91245 Case id: 91245 1..* Activity: enter order line Timestamp: 28-11-2011:08.15 Activity: secure order line Timestamp: 29-11-2011:10.06 OrderLineID: 112347 OrderLineID: 112347 Product: iPod classic Product: iPod classic NofItems: 1 NofItems: 1 0..1 TotalWeight: 0.200 DellID: 882345 TotalWeight: 0.200 DellID: 882345 Attempt Delivery Delivery:882345 0..* 1 DelID : DelID DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 91245 Case id: 91245 Case id: 91245 Activity: delivery attempt Activity: delivery attempt Activity: delivery attempt Timestamp: 05-12-2011:08.55 Timestamp: 06-12-2011:09.12 Timestamp: 07-12-2011:08.56 DellID: 882345 DellID: 882345 DellID: 882345 Successful: false Successful: false Successful: true DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a Contact: 0497-2553660 Contact: 0497-2553660 Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Case id: 91245 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 PAGE 19 Contact: 040-2298761
  21. 21. Order:91245 Case id: 91245 Case id: 91245 Case id: 91245 Activity: create order Activity: pay order Activity: complete order Timestamp: 28-11-2011:08.12 Timestamp: 02-12-2011:13.45 Timestamp: 05-12-2011:11.33 Customer: John Customer: John Customer: John Amount: 100 Amount: 100 Amount: 100OrderLine:112345 Case id: 91245 Case id: 91245 Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.13 Timestamp: 28-11-2011:08.55 OrderLineID: 112345 OrderLineID: 112345 Product: iPhone 4G Product: iPhone 4G NofItems: 1 NofItems: 1 TotalWeight: 0.250 TotalWeight: 0.250 DellID: 882345 DellID: 882345OrderLine:112346 Case id: 91245 Case id: 91245 Case id: 91245 Activity: enter order line Activity: create backorder Activity: secure order line Timestamp: 28-11-2011:08.14 Timestamp: 28-11-2011:08.55 Timestamp: 30-11-2011:09.06 OrderLineID: 112346 OrderLineID: 112346 OrderLineID: 112346 Product: iPod nano Product: iPod nano Product: iPod nano NofItems: 2 NofItems: 2 NofItems: 2 TotalWeight: 0.300 TotalWeight: 0.300 TotalWeight: 0.300 DellID: 882346 DellID: 882346 DellID: 882346OrderLine:112347 Case id: 91245 Case id: 91245 Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.15 Timestamp: 29-11-2011:10.06 OrderLineID: 112347 OrderLineID: 112347 Product: iPod classic Product: iPod classic NofItems: 1 NofItems: 1 TotalWeight: 0.200 TotalWeight: 0.200 DellID: 882345 DellID: 882345Delivery:882345Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 91245 Case id: 91245 Case id: 91245 Activity: delivery attempt Activity: delivery attempt Activity: delivery attempt Timestamp: 05-12-2011:08.55 Timestamp: 06-12-2011:09.12 Timestamp: 07-12-2011:08.56 DellID: 882345 DellID: 882345 DellID: 882345 Successful: false Successful: false Successful: true DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a Contact: 0497-2553660 Contact: 0497-2553660 Contact: 0497-2553660Delivery:882346Attempt:882346-1 Case id: 91245 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 Contact: 040-2298761 PAGE 20
  22. 22. Orderline Order OrderLineID : OrderLineID 1 1..* OrderID : OrderID OrderID : OrderIDOrderline instance Customer : CustID Amount : Euro Product : ProdID NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1OrderLine:112345 Attempt Delivery 0..* 1 Case id: 112345 Case id: 112345 DelID : DelID DelID : DelID Activity: enter order line Activity: secure order line Timestamp: 28-11-2011:08.13 Timestamp: 28-11-2011:08.55 When : DateTime DelAddress : Address OrderLineID: 112345 OrderLineID: 112345 Product: iPhone 4G Product: iPhone 4G Successful : Bool Contact : PhoneNo NofItems: 1 NofItems: 1 TotalWeight: 0.250 TotalWeight: 0.250 DellID: 882345 DellID: 882345 Order:91245 Case id: 112345 Case id: 112345 Case id: 112345 Activity: create order Activity: pay order Activity: complete order Timestamp: 28-11-2011:08.12 Timestamp: 02-12-2011:13.45 Timestamp: 05-12-2011:11.33 Customer: John Customer: John Customer: John Amount: 100 Amount: 100 Amount: 100Delivery:882345Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 112345 Case id: 112345 Case id: 112345 Activity: delivery attempt Activity: delivery attempt Activity: delivery attempt Timestamp: 05-12-2011:08.55 Timestamp: 06-12-2011:09.12 Timestamp: 07-12-2011:08.56 DellID: 882345 DellID: 882345 DellID: 882345 Successful: false Successful: false Successful: true DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a DelAddress: 5513VJ-22a PAGE 21 Contact: 0497-2553660 Contact: 0497-2553660 Contact: 0497-2553660
  23. 23. Other examples• The life cycles of reviewers, authors, papers, reviews, PC chairs, etc.• The life cycles of job applications and vacancies.• X-ray machine logs: machine, machine day, patient, treatment, routine, etc.?• Therefore, the selection and scoping of instances is needed.• Like making deciding on the elements to be put on map; there may be many maps covering partially overlapping areas. PAGE 22
  24. 24. Extracting event logs• Not just a syntactical issue.• Different views are possible.• Important: − Selecting the right instance notion. − Ordering of events. − Selection of events.• Proclets: the true fabric of real-life processes. PAGE 23

×