Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OBDA for Log Extraction in Declarative Process Mining

301 views

Published on

PhD Course on "Ontology-Based Data Access (OBDA) for Log Extraction in Declarative Process Mining" at the 13th Reasoning Web Summer School (RW 2017).

Published in: Data & Analytics
  • Be the first to comment

OBDA for Log Extraction in Declarative Process Mining

  1. 1. Diego Calvanese, Marco Montali, Tahir Emre Kalayci, Ario Santoso KRDB Research Centre for Knowledge and Data Faculty of Computer Science Free University of Bozen-Bolzano montali@inf.unibz.it Process Mining OBDA for Log Extraction in
  2. 2. <Managing Organisations… Mobile by Calder
  3. 3. <Managing Organisations… models managers/ analysts Mobile by Calder
  4. 4. <Managing Organisations… data (knowledge)
 workers Mobile by Calder models managers/ analysts
  5. 5. < Mobile by Calder Managing Organisations… models data ?
  6. 6. Marrying processes and data
 is extremely difficult…. … but is a must 
 if we want to really understand 
 how complex dynamic systems operate. 6
  7. 7. Our Approach 7 Business Process Management Data Management Conceptual Modeling Formal Methods Artificial Intelligence
  8. 8. Our Research 8 Theory Practice
  9. 9. Our Research 9 Theory Practice
  10. 10. <Agenda 1. Intro to process mining 2. The problem of data preparation in process mining 3. The onprom framework: OBDA for data preparation in process mining 4. Process mining demo
  11. 11. Process Mining Process Management Based on Facts Extensive credits: Wil van der Aalst (TU/e), Chiara Ghidini (FBK)
  12. 12. Disclaimer • We will simplify to make the issues more apparent • Criticism has to be seen as a positive force towards improvement
  13. 13. The two realities Reality 1: Managers and Analysts Reality studied, analyzed, planned through using different types of models. Decision making to improve the overall organization.
  14. 14. The two realities Reality 2: Daily workers Reality experienced directly. Decision making to determine how to best handle the current situation.
  15. 15. Management 
 of the organisation Daily work within the organisation Critical Dichotomy
  16. 16. IT Our Goal Management 
 of the organisation Daily work within the organisation
  17. 17. The Traditional Model-Driven Approach
  18. 18. Model (Def.) A simplifying mapping of reality to serve a specific purpose (Stachowiak: Allgemeine Modelltheorie, 1973) • The model corresponds to the modelled object in the sense that it faithfully reproduces some fundamental aspects of such an object
  19. 19. Conceptual Modeling The activity of formally describing some aspects of the physical and social world around us for the purposes of understanding and communication. (John Mylopoulos, 1992)
  20. 20. Conceptual Models in Organisations A model is an abstraction of reality according to a certain conceptualization. Once represented as a concrete artifact, a model can support communication, learning and analysis about relevant aspects of the underlying domain. [. . . ] a represented model (a dusty diagram) created by an unknown predecessor is a medium to preserve and communicate a certain view of the world, and can serve as a vehicle for reasoning and problem solving, and for acquiring new knowledge (maybe having striking new ideas!) about this view of the world. (Guizzardi, 2005)
  21. 21. Models as Human Mediators
  22. 22. Models as IT Mediators Operational process Information System Model
  23. 23. Easy…right?
  24. 24. …right?Conceptual Modeling Languag Clarity: how easy the language can be stakeholders). • Graphical • The langu foundatio • The more di cult is • Less expr combinat • Abstraction: remove unnecessary
  25. 25. Business Process A set of logically related tasks performed to achieve a defined business outcome for a particular customer or market. (Davenport, 1992) A collection of activities that take one or more kinds of input and create an output that is of value to the customer. (Hammer & Champy, 1993) A set of activities performed in coordination in an organizational and technical environment. These activities jointly realize a business goal. (Weske, 2011) 25
  26. 26. Business Process Management A collection of 
 concepts, methods, and techniques 
 to support humans in
 modeling, administration, 
 configuration, execution, 
 analysis, and continuous improvement 
 of business processes 26
  27. 27. Short History • Smith (~1750): division of labour • Taylor (~1911): scientific method applied to organisations • Hammer and Champy (~1990): processes as the basis for reengineering • 2000s: business process lifecycle, process-orientation 27
  28. 28. Value Chains, Business Functions, Tasks ss Functions and Refinement into Activities y of business s follows the ion abstraction. iness functions activities.
  29. 29. From tasks… AnalyseOrder SimpleCheck AdvancedCheck … to their coordination OrderManagement GetOrder CheckOrder AnalyseOrder SimpleCheck AdvancedCheck
  30. 30. End-To-End, Reactive Behaviour Order-to-cash, procure-to-pay, issue-to-resolution, … 30 Receive order Check availability Article available? Ship article Financial settlement yes Procurement no Payment received Inform customer Late deliveryUndeliverable Customer informed Inform customer Article removed Remove article from catalogue Input Output
  31. 31. Process Modelling LanguagesCustomerTravelAgencyAirline Flight needed Check travel agency web site Check flight offer Reject offer Book and pay flight Make flight offer Prepare ticket offer received request received Ticket received Flight paid Offer rejected Booking and payment received Offer rejection received Flight organised Offer cancelled Flight offer Flight offer [rejected] Booking and payment Ticket Pool Start event Exclusive gateway Message event End event Task Event-based gateway Data object BPMN
  32. 32. Flight offer requested Make flight offer Travel Agency Flight offer sent to client Check flight offer Customer XOR Reject Offer Book and pay flight Customer Customer Offer rejected Cancel Offer Travel Agency Flight ticket needed Check travel agency website Customer Offer canceled Offer accepted Website Flight offer Flight offer [rejected] Flight offer [paid] Flight offer [cancelled] Prepare ticket Travel Agency Ticket Airline issues ticket Ticket prepared Send ticket to customer Flight organised Travel Agency Event Function Organization unit Owner Supporting system Process path Logical operation Data Ensure confortable flight Goal EPC/ARIS Process Modelling Languages
  33. 33. UML Activity Diagrams Process Modelling LanguagesCustomerTravelAgency Check travel agency website Make flight offer Flight Offer Check flight offer Reject Offer Book and pay flight Flight Offer [paid] Prepare ticket Ticket Flight Offer [rejected] Cancel Offer Activity partition Action node Initial node Activity final node Decision node Merge node Object node Unsatisfied Satisfied Guard condition
  34. 34. A Process Example create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged.
  35. 35. System ProcessesData Resources 35 But There is More!
  36. 36. But There is More! create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged. data logic transactional lifecycle resources decision logic case notion
  37. 37. Case Object The main subject of the process • May be a concrete or abstract object • An order, a claim, a paper, a request, … • Contemporary process notations: capture well only processes with a single notion of case • The case object is 1-1 with the start event 
 (paper submission -> paper, order request -> order) • But in reality, multiple case objects typically coexist! • Flow of papers vs flow of reviews, flow of customer orders vs flow of packages containing order parts, …
  38. 38. Task Instances • A process model represents abstract tasks • The concrete execution of a task on a case object results in a task instance • The evolution of a task instance goes through multiple events and transitions (durative tasks) • This is regulated by a task transactional lifecycle
  39. 39. Resources Humans/devices responsible for the execution of tasks instances • Usually structured in an organisational model defining roles, duties, capabilities, security levels, … ARIS Organisational structure
  40. 40. Data Logic Management of the master data of the company, including case data and data produced/consumed by processes • Master data are persisted inside information systems • Processes perform CRUD operations over such data • Processes acquire data from the external environment
  41. 41. Structural Models • Represent the structure of the domain of interest • Capture the relevant concepts, attributes, and relationships • Lead to the logical schema of information systems 41 Conceptual Data Models UML Class Diagram ORM Schema
  42. 42. Decision Models Encapsulate the decision logic that leads to infer certain conclusions given input data • This in turn determines how to route a case object in the process • May be implicitly embedded in the process, or represented explicitly 42 DMN Decision table
  43. 43. What are Models Used For? • Understanding and communication • Documentation and audits • Verification and simulation • Basis for unambiguous contracts between a company and its customers • Basis of IT systems supporting the daily work within the organisation How to best combine models and support all these tasks is a very active area of research!
  44. 44. 50% data models 50% configure/ deploy enact/ monitor (re) design IT support reality (knowledge) workers managers/ analysts Traditional Process Enacment: From Handmade Models to Execution
  45. 45. Limits of the traditional approach
  46. 46. Problem #1: Lack of Interaction data models configure/ deploy enact/ monitor IT support reality 50% (re) design 50% (knowledge) workers managers/ analysts ? How to involve all actors in the creation of shared models? How to share strategic goals?
  47. 47. Problem #1: Lack of Interaction models configure/ deploy enact/ monitor IT support reality 50% (re) design 50% (knowledge) workers managers/ analysts ? How to improve such models using data? data
  48. 48. Impasse! • (Knowledge) workers: experience the real organisation, but locally and subjectively • Management: have a global view of the expected organisation, not aligned with reality • Key, open questions: • How to reconcile these two worlds? • How to connect models with reality? How to take strategic decisions based on such connection? • How to ensure that the organisation as a whole is going in the right direction?
  49. 49. Problem #2: Flexibility BPM!
  50. 50. Problem #2: Flexibility BPM?
  51. 51. The Issue of Flexibility
  52. 52. A Clinical Guideline
  53. 53. A Real Clinical Process eexceptions.Figure4depictstheresultofafirstattempttoanalyzethe icationserverlogsusingtheheuristicsminer[4]. Exception (complete) 187 EstabelecimentoNotFoundException (complete) 187 0,991 152 GREJBPersistencyException (complete) 179 0,909 159 PGWSException (complete) 168 0,889 12 ITPTExternalServiceException (complete) 183 0,944 162 SIPSCNoRecordsFoundException (complete) 160 0,8 5 PessoaSingularNotFoundException (complete) 138 0,667 3 BusinessLogicException (complete) 183 0,75 4 SICCLException (complete) 175 0,857 19 NaoExistemRegistosException (complete) 143 0,833 6 RPCBusinessException (complete) 38 0,75 3 SAFBusinessException (complete) 115 0,8 68 GREJBBusinessException (complete) 45 0,75 23 DESWSException (complete) 14 0,667 14 NullPointerException (complete) 104 0,8 91 ValidationException (complete) 31 0,8 12 GILBusinessException (complete) 14 0,5 6 GRServicesException (complete) 7 0,667 3 CSIBusinessException (complete) 14 0,5 6 ConcorrenciaException (complete) 5 0,5 2 CSIPersistencyException (complete) 3 0,5 2 0,857 34 ITPTServerException (complete) 21 0,667 15 COOPException (complete) 4 0,5 2 RSIValidationException (complete) 25 0,667 18 BasicSystemException (complete) 16 0,667 11 PesquisaAmbiguaException (complete) 6 0,5 6 CPFBusinessException (complete) 3 0,5 2 0,8 95 ADOPException (complete) 6 0,5 5 AFBusinessException (complete) 64 SIPSCRemoteBusinessException (complete) 51 0,833 13 ConcurrentModificationException (complete) 5 0,5 1 CDFBusinessException (complete) 6 0,667 2 AssinaturaNaoIncluidaException (complete) 1 0,5 1 SICCSException (complete) 32 0,8 11 CartaoCidadaoException (complete) 64 0,833 38 SOAPException (complete) 22 0,667 14 TooManyRowsException (complete) 112 0,667 18 SIPSCFatalException (complete) 20 0,667 9 LimiteTemporalException (complete) 4 0,5 2 0,8 28 SVIBusinessUserException (complete) 18 0,75 12 GRConcurrencyException (complete) 8 0,5 2 ContribuinteRegionalNotFoundException (complete) 63 0,75 30 JDOFatalUserException (complete) 124 0,947 49 0,667 5 SQLException (complete) 9 0,667 7 IOException (complete) 27 0,75 22 PessoaColectivaNotFoundException (complete) 23 0,75 20 ServiceDelegateRemoteException (complete) 3 0,5 2 0,5 5 PASException (complete) 2 0,5 1 FileNotFoundException (complete) 31 0,75 13 QgenMIParametrizedBusinessException (complete) 1 0,5 1 ADOPMessageException (complete) 3 0,5 2 LayoffException (complete) 1 0,5 1 0,75 8 CMPException (complete) 1 0,5 1 GREJBRemoteServiceException (complete) 34 0,75 4 RSIPersistenceException (complete) 24 0,75 4 CSIRemoteException (complete) 3 0,5 1 SIPSCFatalRemoteCallException (complete) 3 0,5 1 SIPSCDatabaseException (complete) 1 0,5 1 BusinessException (complete) 159 0,667 9 SVIBusinessException (complete) 1 0,5 1 ParametrizedBusinessException (complete) 2 0,5 2 GDServicesException (complete) 4 0,5 3 ServerException (complete) 132 0,75 16 PGException (complete) 6 0,667 5 0,75 4 DESException (complete) 135 0,667 13 0,667 2 0,75 9 SIPSCException (complete) 27 0,75 9 ReportException (complete) 5 0,667 2 SSNServiceException (complete) 1 0,5 1 AFException (complete) 1 0,5 1 InvalidNISSException (complete) 14 0,75 4 0,75 14 GILConcurrencyException (complete) 1 0,5 1 RSISystemException (complete) 28 0,75 7 0,667 5 0,667 1 0,75 2 0,667 5 0,833 5 0,667 5 0,667 4 0,75 12 0,981 53 ADOPUserChoiceException (complete) 1 0,5 1 0,667 5 RPCException (complete) 1 0,5 1 GREJBConcurrencyException (complete) 15 0,875 8 0,5 1 0,5 1 0,667 1 MoradaPortuguesaNotFoundException (complete) 1 0,5 1 0,75 4 0,5 1 0,667 6 0,5 1 0,5 2 0,889 8 0,75 3 0,8 3 RSIException (complete) 1 0,5 1 0,5 1 0,5 1 0,667 4 0,667 3 0,5 1 0,5 2 0,75 5 0,5 1 0,5 1 0,5 2 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,8 1 0,5 1 0,5 1 0,5 1 4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics r.
  54. 54. A Real Clinical Process eexceptions.Figure4depictstheresultofafirstattempttoanalyzethe icationserverlogsusingtheheuristicsminer[4]. Exception (complete) 187 EstabelecimentoNotFoundException (complete) 187 0,991 152 GREJBPersistencyException (complete) 179 0,909 159 PGWSException (complete) 168 0,889 12 ITPTExternalServiceException (complete) 183 0,944 162 SIPSCNoRecordsFoundException (complete) 160 0,8 5 PessoaSingularNotFoundException (complete) 138 0,667 3 BusinessLogicException (complete) 183 0,75 4 SICCLException (complete) 175 0,857 19 NaoExistemRegistosException (complete) 143 0,833 6 RPCBusinessException (complete) 38 0,75 3 SAFBusinessException (complete) 115 0,8 68 GREJBBusinessException (complete) 45 0,75 23 DESWSException (complete) 14 0,667 14 NullPointerException (complete) 104 0,8 91 ValidationException (complete) 31 0,8 12 GILBusinessException (complete) 14 0,5 6 GRServicesException (complete) 7 0,667 3 CSIBusinessException (complete) 14 0,5 6 ConcorrenciaException (complete) 5 0,5 2 CSIPersistencyException (complete) 3 0,5 2 0,857 34 ITPTServerException (complete) 21 0,667 15 COOPException (complete) 4 0,5 2 RSIValidationException (complete) 25 0,667 18 BasicSystemException (complete) 16 0,667 11 PesquisaAmbiguaException (complete) 6 0,5 6 CPFBusinessException (complete) 3 0,5 2 0,8 95 ADOPException (complete) 6 0,5 5 AFBusinessException (complete) 64 SIPSCRemoteBusinessException (complete) 51 0,833 13 ConcurrentModificationException (complete) 5 0,5 1 CDFBusinessException (complete) 6 0,667 2 AssinaturaNaoIncluidaException (complete) 1 0,5 1 SICCSException (complete) 32 0,8 11 CartaoCidadaoException (complete) 64 0,833 38 SOAPException (complete) 22 0,667 14 TooManyRowsException (complete) 112 0,667 18 SIPSCFatalException (complete) 20 0,667 9 LimiteTemporalException (complete) 4 0,5 2 0,8 28 SVIBusinessUserException (complete) 18 0,75 12 GRConcurrencyException (complete) 8 0,5 2 ContribuinteRegionalNotFoundException (complete) 63 0,75 30 JDOFatalUserException (complete) 124 0,947 49 0,667 5 SQLException (complete) 9 0,667 7 IOException (complete) 27 0,75 22 PessoaColectivaNotFoundException (complete) 23 0,75 20 ServiceDelegateRemoteException (complete) 3 0,5 2 0,5 5 PASException (complete) 2 0,5 1 FileNotFoundException (complete) 31 0,75 13 QgenMIParametrizedBusinessException (complete) 1 0,5 1 ADOPMessageException (complete) 3 0,5 2 LayoffException (complete) 1 0,5 1 0,75 8 CMPException (complete) 1 0,5 1 GREJBRemoteServiceException (complete) 34 0,75 4 RSIPersistenceException (complete) 24 0,75 4 CSIRemoteException (complete) 3 0,5 1 SIPSCFatalRemoteCallException (complete) 3 0,5 1 SIPSCDatabaseException (complete) 1 0,5 1 BusinessException (complete) 159 0,667 9 SVIBusinessException (complete) 1 0,5 1 ParametrizedBusinessException (complete) 2 0,5 2 GDServicesException (complete) 4 0,5 3 ServerException (complete) 132 0,75 16 PGException (complete) 6 0,667 5 0,75 4 DESException (complete) 135 0,667 13 0,667 2 0,75 9 SIPSCException (complete) 27 0,75 9 ReportException (complete) 5 0,667 2 SSNServiceException (complete) 1 0,5 1 AFException (complete) 1 0,5 1 InvalidNISSException (complete) 14 0,75 4 0,75 14 GILConcurrencyException (complete) 1 0,5 1 RSISystemException (complete) 28 0,75 7 0,667 5 0,667 1 0,75 2 0,667 5 0,833 5 0,667 5 0,667 4 0,75 12 0,981 53 ADOPUserChoiceException (complete) 1 0,5 1 0,667 5 RPCException (complete) 1 0,5 1 GREJBConcurrencyException (complete) 15 0,875 8 0,5 1 0,5 1 0,667 1 MoradaPortuguesaNotFoundException (complete) 1 0,5 1 0,75 4 0,5 1 0,667 6 0,5 1 0,5 2 0,889 8 0,75 3 0,8 3 RSIException (complete) 1 0,5 1 0,5 1 0,5 1 0,667 4 0,667 3 0,5 1 0,5 2 0,75 5 0,5 1 0,5 1 0,5 2 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,8 1 0,5 1 0,5 1 0,5 1 4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics r.
  55. 55. The Effect • Processes are only partially encoded into IT systems • IT systems need to support a backdoor to circumvent the encoded processes • Otherwise, people will act “outside”, using the system just to “record” • No hope to improve: knowledge-intensive processes cannot be automated in the classical sense!
  56. 56. Cosa vedremo oggiAn Hazardous Attempt: Go Without Models
  57. 57. Process Model vs Instance Business Process • Tasks • Data schema Process Instance • Task instances • Data values • Car assembly process • Task: mount doors • Frame#, 
 buyer ID, 
 car color • Assembly of car 123 • Task instance #54: mount doors on car 123 • Frame#: 123, 
 buyer: Diego, 
 car color: white
  58. 58. Processes Leave Digital Footprints Within organisations: event data related to process executions are continuously stored for • Internal management • Calculation of process metrics/KPIs • Legal reasons (compliance, external audits) In addition: internally and externally, more data are stored somewhere • We live in a digital society! • Social networks, sensors, cyberphysical systems, mobile devices are data loggers
  59. 59. Situation 1: Explicit Event Logs Organisation equipped with process-aware information systems • Supporting humans in the execution of processes (task assignments, todo lists) • Explicitly logging events, with info about: 
 - timestamp
 - event type (start, end, reassign, …)
 - reference task
 - reference case
 - task instance id
 - responsible resource
 - additional attributes
  60. 60. Explicit Event Logs create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged. Example 1. As a running example, we consider a simplified conference submission system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au- thors, reviewers, and conference chairs in the submission of papers to conferences, the consequent review process, and the final decision about paper acceptance or rejection. Figure 2 shows the process control flow considering papers as case objects. Under this perspective, the management of a single paper evolves through the following execution steps. First, the paper is created by one of its authors, and submitted to a conference available in the system. Once the paper is submitted, the review phase for that paper starts. This phase of the process consists of a so-called multi-instance section, i.e., a section of the process where the same set of activities is instantiated multiple times on the same paper, and then executed in parallel. In the case of CONFSYS, this section is instantiated for each reviewer selected by the conference chair for the paper, and con- sists of the following three activities: (i) a reviewer is assigned to the paper; (ii) the reviewer produces the review; (iii) the reviewer submits the review to CONFSYS. The multi-instance section is considered completed only when all its parallel instantiations Event Data Case ID ID Timestamp Activity User . . . 1 35654423 30-12-2010:11.02 create paper Pete . . . 35654424 31-12-2010:10.06 submit paper Pete . . . 35654425 05-01-2011:15.12 assign review Mike . . . 35654426 06-01-2011:11.18 submit review Sara . . . 35654428 07-01-2011:14.24 accept paper Mike . . . 35654429 06-01-2011:11.18 upload CR Pete . . . 2 35654483 30-12-2010:11.32 create paper George . . . 35654485 30-12-2010:12.12 submit paper John . . . 35654487 30-12-2010:14.16 assign review Mike . . . 35654489 16-01-2011:10.30 submit review Ellen . . . 35654490 18-01-2011:12.05 reject paper Mike . . . 50%
  61. 61. Situation 2: Implicit Event Logs Organisation equipped with generic enterprise information systems • CRM, ERP systems to handle customers and tasks • Legacy information systems • Domain-specific systems Data are stored with different formats and according to different domain-specific schemas • No explicit events • Data scattered in several data sources
  62. 62. 63 Implicit Event Logs
  63. 63. 64 Implicit Event Logs ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic
  64. 64. Data never sleeps
  65. 65. 50% data 50% enact/ monitor IT support++ reality (knowledge) workers managers/ analysts The New Trend: No Models! adjust
  66. 66. Some Famous Examples
  67. 67. Do we Need Models at All?
  68. 68. Calm down, and think…
  69. 69. Alcohol and Fat It’s a relief to know the truth after all those conflicting medical studies.
 The Japanese eat very little fat and suffer fewer heart attacks than the British or Americans.
 The French eat a lot of fat and also suffer fewer heart attacks than the British or Americans.
 The Japanese drink very little red wine and suffer fewer heart attacks than the British or Americans.
 The Italians drink excessive amount of red wine and also suffer fewer heart attacks than the British or Americans.
 The Germans drink a lot of beer and eat lots of sausages and fats and suffer fewer heart attacks than the British or Americans. Conclusion: Eat and drink what you like. Speaking English is apparently what kills you
  70. 70. Spurious Correlations
  71. 71. Spurious Correlations
  72. 72. Result Crompton (2008): domain experts loose (too much) time in finding data to operate and take strategic decisions • Engineers in the oil/gas industry: 30-70% working time in data crawling and data quality
  73. 73. Models Enable Decision Making Humans understand reality through models • Data alone are meaningless • Machine learning/deep learning techniques are unable to expose their models: no human in the decision making loop! • Models not only for decision making, but also to explain and guide
  74. 74. Process Management Based on Facts
  75. 75. 50% data models 50% configure/ deploy diagnose/ get reqs. enact/ monitor (re) design adjust IT support reality (knowledge) workers managers/ analysts
  76. 76. 50% 50% configure/ deploy diagnose/ get reqs. enact/ monitor (re) design adjust (knowledge) workers managers/ analysts data models IT support reality
  77. 77. Process Mining: Data Science in Action [See process mining manifesto] 1.3 Process Mining 9 Fig. 1.4 Positioning of the three main types of process mining: discovery, conformance, and en- hancement
  78. 78. The Three Pillars of Process Mining elements is essential for proc event log process model Play-In event logprocess model Play-Out Replay • extended model showing times, frequencies, etc. • diagnostics event log process model Play-In event logprocess model Play-Out Replay • extended model showing times, frequencies, etc. • diagnostics • predictions event log process model Play-In event logprocess model Play-Out event log process model Replay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendations Play in Play out Replay
  79. 79. Play In register travel request (a) get detailed motivation letter (c) get support from local manager (b) check budget by finance (d) decide (e) accept request (g) reject request (h) reinitiate request (f) start end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
  80. 80. Play-in: Discovery Event logs implicitly contain the real process! Making it explicit gives: •knowledge and understanding •ground for discussion •possibility to act by: •correcting issues •compare with the designed models (“should be” vs “as is”) •evolve the models •re-engineer the organisation credits to W.M.P. van der Aalst
  81. 81. Discovery: Crash Course • L’idea principale: guardare ai dati da una prospettiva “process oriented” Case Id = l’istanza di processo Evento Tempo di inizio Tempo di fine
  82. 82. From Data Mining… Credits: Anne Rozinat
  83. 83. …to Process Mining Credits: Anne Rozinat
  84. 84. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E
  85. 85. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1
  86. 86. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2
  87. 87. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B
  88. 88. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B A B C D E
  89. 89. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B A B C D E
  90. 90. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B A B C D E
  91. 91. Discovery in a Tool: DISCO Demo Later Event Log Discovered Process
  92. 92. Play Out In a Nutshell register travel request (a) get detailed motivation letter (c) get support from local manager (b) check budget by finance (d) decide (e) accept request (g) reject request (h) reinitiate request (f) start end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
  93. 93. Play-out: Simulation
  94. 94. Replay in a Nutshell credits to W.M.P. van der Aalst register travel request (a) get detailed motivation letter (c) get support from local manager (b) check budget by finance (d) decide (e) accept request (g) reject request (h) reinitiate request (f) start end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
  95. 95. Replay: Animation
  96. 96. Replay: Enhancement
  97. 97. Replay: Enhancement
  98. 98. Replay: Conformance Checking
  99. 99. Replay: Conformance Checking
  100. 100. Conformance Checking Goal: understand and quantify the degree of alignment between models and reality credits to W.M.P. van der Aalst
  101. 101. Conformance Checking: Idea A B C D E Case 1 A B C D E
  102. 102. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A
  103. 103. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B
  104. 104. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B C
  105. 105. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B C D
  106. 106. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B C D E 3
  107. 107. Analisi di conformità: come funziona A D C D E Case 2 A B C D E
  108. 108. Analisi di conformità: come funziona A D C D E Case 2 A B C D E A D ? 7
  109. 109. Process Repair: Beyond Conformance Checking Deviations are incorporated into the process model
  110. 110. Repair: Idea A D C D E Case 3 A B C D E AA D C D E Case 1 A D C D E Case 2 A D C D E Case 4 A D C D E Case k ……. D ? 7 C D E
  111. 111. A D C D E Case 3 A B C D E AA D C D E Case 1 A D C D E Case 2 A D C D E Case 4 A D C D E Case k ……. D ? 7 C D E A common deviation: maybe the model is wrong/ outdated! Repair: Idea
  112. 112. A D C D E Case 3 A B C D E AA D C D E Case 1 A D C D E Case 2 A D C D E Case 4 A D C D E Case k ……. C D E D D 3 Repair: Idea
  113. 113. Practice 115
  114. 114. PracticeCamunda ERP Signavio Document-driven EPCs GSM BPMN CMMN Case Management Legacy SystemsCRM E-R Bizagi Aris UML Artifac-Centric SAP Object-Centric 116 Proprietary Systems Bonita
  115. 115. IEEE XES Standard [www.xes-standard.org] IEEE Standard for the representation of event logs • Based on XML • Minimal mandatory structure: 
 log consists of traces, each representing the history of a case
 trace consists of a list of atomic events • Extensions to “decorate” log, trace, event with informative attributes: timestamps, task names, transactional lifecycle, resources, additional event data • Supports “meta-level” declarations useful for log processors 117
  116. 116. 118 <log xes.version="1.0" xes.features="nested-attributes" openxes.version="1.0RC7"> <extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/> <classifier name="Event Name" keys="concept:name"/> <string key="concept:name" value="XES Event Log"/> ... <trace> <string key="concept:name" value="1"/> <event> <string key="User" value="Pete"/> <string key="concept:name" value="create paper"/> <int key="Event ID" value="35654424"/> ... </event> <event> ... <string key="concept:name" value="submit paper"/> ... </event> ... </trace> <trace> ... </trace> … </log>
  117. 117. Full XES Schema 119 Attribute are used. In addition, the role names e-has-a, t-has-a, and t-contains-e are used to capture the binary relations among such concepts. To restrict the usage of those attKey: String attType: String Attribute extName: String extPrefix: String extUri: String Extension attValue: String ElementaryAttribute CompositeAttribute {disjoint} ca-contains-a * * logFeatures: String logVersion: String Log Trace Event GlobalAttribute GlobalEventAttribute GlobalTraceAttribute EventClassifierTraceClassifier name: String Classifier a-contains-a * * ** e-usedBy-a e-usedBy-l * * l-contains-t t-contains-e* **1..* l-contains-e * * * * * *** l-has-a t-has-a e-has-a l-has-gea * 1..* l-contains-ec 1..* * 1..* l-contains-tc * ec-definedBy-gea 1..* * 1..* 1..* * tc-definedBy-gea l-has-gta * {disjoint} {disjoint}
  118. 118. Core XES Schema 120 OBDA for Log Extraction in Process Mining Attribute attKey: String attType: String attValue: String EventTrace e-has-at-has-a t-contains-e 0..* 0..* 0..* 0..* 1..* 0..* Fig. 13: Core event schema We show now how such a simple schema can be suitably encoded in DL-LiteA. code the core event schema of Figure 13, the three concept names Trace, Event, a
  119. 119. Quality of Logs 121 Level Characterization Examples ★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy and complete) and events are well-defined. Events are recorded in an automatic, systematic, reliable, and safe manner. Privacy and security considerations are addressed adequately. Moreover, the events recorded (and all of their attributes) have clear semantics. This implies the existence of one or more ontologies. Events and their attributes point to this ontology. Semantically annotated logs of BPM systems. ★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable manner, i.e., logs are trustworthy and complete. Unlike the systems operating at level , notions such as process instance (case) and activity are supported in an explicit manner. Events logs of traditional BPM/ workflow systems. ★ ★ ★ Events are recorded automatically, but no systematic approach is followed to record events. However, unlike logs at level , there is some level of guarantee that the events recorded match reality (i.e., the event log is trustworthy but not necessarily complete). Consider, for example, the events recorded by an ERP system. Although events need to be extracted from a variety of tables, the information can be assumed to be correct (e.g., it is safe to assume that a payment recorded by the ERP actually exists and vice versa). Tables in ERP systems, event logs of CRM systems, transaction logs of messaging systems, event logs of high-tech systems, etc. ★ ★ Events are recorded automatically, i.e., as a by-product of some Event logs of document and ★★★ ★★
  120. 120. From Event Logs to XES • Level 4-5: straightforward syntactic manipulation • Level 3: much more difficult, due to • Multiple data sources • Interpretation of data • Lack of explicit information about cases and events 122
  121. 121. Traditional Extraction from Legacy Data 123 itional Methodology Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N xtraction and Process Mining , EBITmax converted the log view into a CSV file, and analysed it usin process mining toolkit7 . • Manual construction of views and ETL procedures to fetch the data • Done by IT experts, not by knowledge workers (domain experts) • Crucial issues: • Correctness: who knows?
 Process mining is dangerous if applied on wrong data • maintenance, evolution, change of perspective are hard…
 But process mining should be highly interactive
  122. 122. The onprom Approach [http://onprom.inf.unibz.it] 124 34 D. Calvanese et al. high-level IS? Create conceptual data schema Create mappings Bootstrap model + mappings Enrich model + mappings Choose perspective Create event-data annotations Get XES/CSV Do process mining Other perspective? N Y Y N Fig. 12: The onprom methodology and its four phases the same time generating (identity) mappings to link the two specifications. The result of bootstrapping can then be manually refined. Once the first phase is completed, process analysts and the other involved stake- holders do not need anymore to consider the structure of the legacy information system, Intelligent data management and conceptual modelling to: 1. Understand the data 2. Access the data using the domain vocabulary 3. Express the perspective for process mining using the domain vocabulary 4. Automatise the extraction of XES event logs
  123. 123. Step 1 Understand the data
  124. 124. An Enterprise System 126
  125. 125. Information Structure 127 ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~x).
  126. 126. Actual Data 128 ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~x).
  127. 127. Actual Data: Meaning? 129 ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~x).
  128. 128. 130 Ontology-Based Data Access
  129. 129. Conference Example: Conceptual Data Schema 131 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running exampleN.B.: in on prom we use DL-LiteA (supports a controlled form of functionality)
  130. 130. 132 (title) ⌘ Paper ⇢(title) v string (funct title) (type) ⌘ Paper ⇢(type) v string (funct type) (decTime) ⌘ DecidedPaper ⇢(decTime) v ts (funct decTime) (accepted) ⌘ DecidedPaper ⇢(accepted) v boolean (funct accepted) (pName) ⌘ Person ⇢(pName) v string (funct pName) (regTime) ⌘ Person ⇢(regTime) v ts (funct regTime) (cName) ⌘ Conference ⇢(cName) v string (funct cName) (crTime) ⌘ Conference ⇢(crTime) v ts (funct crTime) (uploadTime) ⌘ Submission ⇢(uploadTime) v ts (funct uploadTime) (invTime) ⌘ Assignment ⇢(invTime) v ts (funct invTime) (subTime) ⌘ Review ⇢(subTime) v ts (funct subTime) DecidedPaper v Paper Creation v Submission CRUpload v Submission 9Submission1 ⌘ Submission 9Submission1 ⌘ Paper (funct Submission1) 9Submission2 ⌘ Submission 9Submission2 v Person (funct Submission2) 9Assignment1 ⌘ Assignment 9Assignment1 v Paper (funct Assignment1) 9Assignment2 ⌘ Assignment 9Assignment2 v Person (funct Assignment2) 9leadsTo v Assignment 9leadsTo ⌘ Review (funct leadsTo) (funct leadsTo ) 9submittedTo ⌘ Paper 9submittedTo v Conference (funct submittedTo) 9notifiedBy ⌘ DecidedPaper 9notifiedBy v Person (funct notifiedBy) 9chairs v Person 9chairs ⌘ Conference (funct chairs ) OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example Correctness of the Encoding. The encoding we have provided is faithful, in the sense that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram. Obviously, since, due to reification, the ontology alphabet may contain additional sym- bols with respect to those used in the UML class diagram, the two specifications cannot have the same logical models. However, it is possible to show that the logical models of a UML class diagram and those of the DL-LiteA ontology derived from it correspond to each other, and hence that satisfiability of a class or association in the UML diagram corresponds to satisfiability of the corresponding concept or role [29,7]. Example 9. We illustrate the encoding of UML class diagrams in DL-LiteA on the
  131. 131. Mapping Example 133 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation er ean fiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example of the Encoding. The encoding we have provided is faithful, in the sense eserves in the DL-LiteA ontology the semantics of the UML class diagram. nce, due to reification, the ontology alphabet may contain additional sym- Primary keys are underlined and foreign keys are shown in italic Example 10. Consider the CONFSYS running example, and an informatio whose db schema R consists of the eight relational tables shown in Figur give some examples of mapping assertions: – The following mapping assertion explicitly populates the concept Crea term :submission/{oid} in the target part represents a URI temp one placeholder, {oid}, which gets replaced with the values for oid through the source query. This mapping expresses that each value in SUB identified by oid and such that its upload time equals the correspondin creation time, is mapped to an object :submission/oid, which bec instance of concept Creation in T . SELECT DISTINCT SUBMISSION.ID AS oid FROM SUBMISSION, PAPER WHERE SUBMISSION.PAPER = PAPER.ID AND SUBMISSION.UPLOADTIME = PAPER.CT :submission/{oid} rdf:type :Creation . – The following mapping assertion retrieves from the PAPER table instanc concept Paper, and instantiates also their features title and type with value String. SELECT ID, title, type ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status
  132. 132. Step 2 Find the event data
  133. 133. Annotating the Conceptual Data Schema Fix perspective: declare the case • Find the class whose instances are considered as case objects • Express additional filters Find the events (looking for timestamps) • Find the classes whose instances refer to events • Declare how they are connected to corresponding case objects —> navigation in the UML class diagram • Declare how they are (in)directly related to event attributes
 (timestamp, task name, optionally event type and resource)
 —> navigation in the UML class diagram 135
  134. 134. 136 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  135. 135. 137 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  136. 136. 138 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  137. 137. 139 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  138. 138. 140 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  139. 139. Switching Perspective Simply amounts to redefine the annotations • Flow of accepted papers • Flow of full papers • Flow of reviews • Flow of authors • Flow of reviewers • …. 141
  140. 140. Step 3 Get your log, automatically!
  141. 141. Formalizing Annotations Annotations are nothing else than SPARQL queries over the conceptual data schema! • Case annotation: query retrieving case objects • Event annotation: query retrieving event objects • Case-attribute annotation: query retrieving pairs <attribute, case> • Event-attribute annotation: query retrieving pairs <attribute, event> 143
  142. 142. 144 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Time !Assignment1 Time !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * * * 1..* 1 1 0..1 * 1 1 * Annotated data model of our CONFSYS running example ively used to capture the relationship between the event and its cor- timestamp, and activity. As pointed out before, the timestamp anno- a functional navigation. This also applies to the activity annotation, ?case rdf:type :Paper . } which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT qu swer variable, this time matching with actual event identifiers occurrences of events. Example 14. Consider the event annotation for creation, as sh actual events for this annotation are retrieved using the following PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT q variables, establishing a relation between events and their corre ues. In this light, for timestamp and activity attribute annotatio variable will be substituted by corresponding values for timestam case attribute annotations, instead, the second answer variable case objects, thus establishing a relationship between events an long to. Example 15. Consider again the annotation for creation events, The relationship between creation events and their correspondin lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creatio vent annotations are also tackled using SPARQL SELECT queries with a single an- wer variable, this time matching with actual event identifiers, i.e., objects denoting ccurrences of events. xample 14. Consider the event annotation for creation, as shown in Figure 16. The ctual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } hich in fact returns all instances of the Creation class. ttribute annotations are formalised using SPARQL SELECT queries with two answer ariables, establishing a relation between events and their corresponding attribute val- es. In this light, for timestamp and activity attribute annotations, the second answer ariable will be substituted by corresponding values for timestamps/activity names. For ase attribute annotations, instead, the second answer variable will be substituted by ase objects, thus establishing a relationship between events and the case(s) they be- ong to. xample 15. Consider again the annotation for creation events, as shown in Figure 16. he relationship between creation events and their corresponding timestamps is estab- shed by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } hich indeed retrieves all instances of Creation, together with the corresponding values ken by the uploadTime attribute.
  143. 143. Annotations and XES Elements Annotations can be easily “mapped” onto XES elements:
 case annotation query —> traces
 event annotation query —> events
 attribute annotation query —> trace/event attributes with given key
 145 OBDA for Log Extraction in Process Mining Attribute attKey: String attType: String attValue: String EventTrace e-has-at-has-a t-contains-e 0..* 0..* 0..* 0..* 1..* 0..*
  144. 144. 146 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Time !Assignment1 Time !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * * * 1..* 1 1 0..1 * 1 1 * Annotated data model of our CONFSYS running example ively used to capture the relationship between the event and its cor- timestamp, and activity. As pointed out before, the timestamp anno- a functional navigation. This also applies to the activity annotation, ?case rdf:type :Paper . } which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT qu swer variable, this time matching with actual event identifiers occurrences of events. Example 14. Consider the event annotation for creation, as sh actual events for this annotation are retrieved using the following PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT q variables, establishing a relation between events and their corre ues. In this light, for timestamp and activity attribute annotatio variable will be substituted by corresponding values for timestam case attribute annotations, instead, the second answer variable case objects, thus establishing a relationship between events an long to. Example 15. Consider again the annotation for creation events, The relationship between creation events and their correspondin lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creatio XES events:
 - id: ?creationEvent vent annotations are also tackled using SPARQL SELECT queries with a single an- wer variable, this time matching with actual event identifiers, i.e., objects denoting ccurrences of events. xample 14. Consider the event annotation for creation, as shown in Figure 16. The ctual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } hich in fact returns all instances of the Creation class. ttribute annotations are formalised using SPARQL SELECT queries with two answer ariables, establishing a relation between events and their corresponding attribute val- es. In this light, for timestamp and activity attribute annotations, the second answer ariable will be substituted by corresponding values for timestamps/activity names. For ase attribute annotations, instead, the second answer variable will be substituted by ase objects, thus establishing a relationship between events and the case(s) they be- ong to. xample 15. Consider again the annotation for creation events, as shown in Figure 16. he relationship between creation events and their corresponding timestamps is estab- shed by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } hich indeed retrieves all instances of Creation, together with the corresponding values ken by the uploadTime attribute. XES attribute:
 - key: timestamp extension - type: milliseconds
 - value: ?creationTime - parent event: ?creationEvent
  145. 145. Rewriting Annotations Annotations are nothing else than SPARQL queries over the conceptual data schema 147 They can be automatically reformulated as SQL queries over the legacy data We automatically get a standard OBDA mapping from the legacy data to the XES concepts
  146. 146. 148 In the first step, the SPARQL queries formalising the annotations in L are reformu- lated into corresponding SQL queries posed directly over I. This is done by relying on standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten considering the contribution of the conceptual data schema T , and then unfolded using the mappings in M. The resulting query qsql can then be posed directly over I so as to retrieve the data associated to the corresponding annotation. In the following, we denote the set of all so-obtained SQL queries as Lsql. Example 16. Consider the SPARQL query in Example 13, formalising the event anno- tation that accounts for the creation of papers. A possible reformulation of the rewriting and unfolding of such a query respectively using the conceptual data schema in Fig- ure 9, and the mappings from Example 10, is the following SQL query: SELECT DISTINCT CONCAT(’http://www.example.com/submission/’,Submission."ID") AS "creationEvent" FROM Submission, Paper WHERE Submission."Paper" = Paper."ID" AND Submission."UploadTime" = Paper."CT" AND Submission."ID" IS NOT NULL This query is generated by the ontop OBDA system, which applies various optimisa- tions so as to obtain a final SQL query that is not only correct, but also possibly compact and fast to process by a standard DBMS. One such optimisations is the application of ng CRUpload Creation chairs Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 1 NFSYS running example nship between the event and its cor- ted out before, the timestamp anno- o applies to the activity annotation, functional navigation, the activity that independently fixes the name additional optional attribute anno- standard extensions provided XES, y transactional lifecycle, as well as urce name and/or role. occurrences of events. Example 14. Consider the event annotation for creation, as shown in Figure 16. The actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries with two answer variables, establishing a relation between events and their corresponding attribute val- ues. In this light, for timestamp and activity attribute annotations, the second answer variable will be substituted by corresponding values for timestamps/activity names. For case attribute annotations, instead, the second answer variable will be substituted by case objects, thus establishing a relationship between events and the case(s) they be- long to. Example 15. Consider again the annotation for creation events, as shown in Figure 16. The relationship between creation events and their corresponding timestamps is estab- lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } which indeed retrieves all instances of Creation, together with the corresponding values taken by the uploadTime attribute. XES events:
 - id: ?creationEvent OBDA for Log Extraction in Process Mining 43 ry q(c) 2 Lsql obtained from a case annotation, we insert into OBDA mapping: q(c) :trace/{c} rdf:type :Trace . mapping populates the concept Trace in E with the case objects m the answers returned by query q(c). ry q(e) 2 Lsql that is obtained from an event annotation, we following OBDA mapping: q(e) :event/{e} rdf:type :Event . mapping populates the concept Event in E with the event objects m the answers returned by query q(e). OBDA for Log Extraction in Process Mining or each SQL query q(c) 2 Lsql obtained from a case annotation, we ins ME P the following OBDA mapping: q(c) :trace/{c} rdf:type :Trace . tuitively, such a mapping populates the concept Trace in E with the case at are created from the answers returned by query q(c). or each SQL query q(e) 2 Lsql that is obtained from an event annotati sert into ME P the following OBDA mapping: q(e) :event/{e} rdf:type :Event . tuitively, such a mapping populates the concept Event in E with the event at are created from the answers returned by query q(e). as a XES event log, and also to actually materialise such an event log. Technically, onprom takes as input an onprom model P = hI, T , M, event schema E, and produces new OBDA system hI, ME P , Ei, where the a in L are automatically reformulated as OBDA mappings ME P that directly l Such mappings are synthesised using the three-step approach described nex In the first step, the SPARQL queries formalising the annotations in L ar lated into corresponding SQL queries posed directly over I. This is done by standard query rewriting and unfolding, where each SPARQL query q 2 Lq considering the contribution of the conceptual data schema T , and then unfo the mappings in M. The resulting query qsql can then be posed directly ove retrieve the data associated to the corresponding annotation. In the following the set of all so-obtained SQL queries as Lsql. Example 16. Consider the SPARQL query in Example 13, formalising the e tation that accounts for the creation of papers. A possible reformulation of th and unfolding of such a query respectively using the conceptual data sche ure 9, and the mappings from Example 10, is the following SQL query: SELECT DISTINCT CONCAT(’http://www.example.com/submission/’,Submiss AS "creationEvent" FROM Submission, Paper WHERE Submission."Paper" = Paper."ID" AND
  147. 147. Recap 149 OBDA for Log Extraction in Process Mining 37 D (database) R (db schema) conforms to M (mapping specification) T (conceptual data schema) L (event-data annotations) P (onprom model) E (conceptual event schema) annotates points to ME P (log mapping specification) I (information system) B (OBDA model)
  148. 148. Querying the “Virtual Log” SPARQL queries over the event schema are answered using legacy data • Example: get empty and nonempty traces; for nonempty traces, also fetch all their events Answers can be serialised into a fully compliant XES log! 150 name. The following query is instead meant to retrieve (elementary) attributes, considering in particular their key, type, and value. PREFIX : <http://www.example.org/> SELECT DISTINCT ?att ?attType ?attKey ?attValue WHERE { ?att rdf:type :Attribute; :attType ?attType; :attKey ?attKey; :attVal ?attValue. } The following query handles the retrieval of empty and nonempty traces, simulta- neously obtaining, for nonempty traces, their constitutive events: PREFIX : <http://www.example.org/> SELECT DISTINCT ?trace ?event WHERE { ?trace a :Trace . OPTIONAL { ?trace :t-contain-e ?event . ?event :e-contain-a ?timestamp . ?timestamp :attKey "time:timestamp"ˆˆxsd:string . ?event :e-contain-a ?name . ?name :attKey "concept:name"ˆˆxsd:string . } } 4.6 The onprom Toolchain onprom comes with a toolchain that supports the various phases of the methodology
  149. 149. The onprom Toolchain Implementation of all the described steps using • Java (GUIs, algorithms) • OWL 2 QL plus functionality (conceptual schemas) • ontop (OBDA system) • OpenXES (XES serialisation and manipulation) • ProM process mining framework (environment) 151
  150. 150. onprom UML Editor 152 46 D. Calvanese et al. Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
  151. 151. onprom Annotation Editor 153 OBDA for Log Extraction in Process Mining 47 Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
  152. 152. onprom Log Extractor 154 OBDA for Log Extraction in Process Mining 49 Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
  153. 153. Experiments • Very encouraging initial experiments • Carried out using synthetic data • We are looking for real case studies! 155
  154. 154. Data Generation with CPN Tools 156
  155. 155. Results 157 Postgres 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Extracted Components in XES log Runningtime(inmilliseconds) 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Tuple(s) in the whole database # Tuple(s) in the whole database Runningtime(inmilliseconds) 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Extracted Components in XES log Runningtime(inmilliseconds) 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Tuple(s) in the whole database # Tuple(s) in the whole database Runningtime(inmilliseconds) ~11 mins to extract ~9M XES components from ~3,5M tuples
  156. 156. 158
  157. 157. 159 So What?
  158. 158. Demo with Disco [fluxicon.com]
  159. 159. Other tools: ProM
 [http://www.promtools.org] • The most famous academic initiative in process mining • Cutting-edge process mining algorithms are there • Pluggable architecture • Dozens of plug-ins
  160. 160. Other Tools: Celonis [http://www.celonis.com] Native Process Mining on top of SAP
  161. 161. Conclusions • Process Mining as a way to reconcile model-driven management and the real behaviours • Data preparation is an issue in presence of legacy data • Ontology-Based Data Access: solid theoretical basis with optimised implementations • onprom as an effective tool chain for extracting event logs from legacy databases
  162. 162. Future Work • Conceptual Modeling • How to improve the discovery of events? • How to semi-automatically proposed events to the user? • How to integrate methodologies and results from formal ontology? • Engineering • How to handle different types of data? • How to deal with different event schemas that go beyond XES? • How to generalise the approach to handle rich ontology-to- ontology-mappings?

×