Marcello LeidaExtendible data model for real-time     business process analysis       IEEE-IEEM 10-13 December 2012, Honk ...
Our Target: Real time process monitoring
Limits of actual approaches•   Experience collected by deploying Business Process Mining tools in Enterprise    environmen...
Research Challenges for new            generation of BPM tools• Flexibility: representing the process model using a formal...
Novel representation of Process                 Model• Need for a less constraining and rigid model has  arise;• Represent...
RDF• We use RDF to define the process model;• RDF is a standard vocabulary definition which is at  the basis of the Semant...
RDFConcepts, relations and attributes are modelled as a labeled oriented graph, defined by a set of triples      <s,p,o>s ...
An Extensible Business Process              Data ModelBasic Process Model representation                          startTim...
RDF graph of the basic process model: EBTIC-BPM vocabularyThe elements in the conceptual model of theprevious slide are de...
Extending the EBTIC-BPM vocabularyDomain Specific Process Model representation  Schema Level                             s...
RDF graph of the domain specific extension vocabulary                                                           S         ...
Instances – process execution data: 1 process instance                                                                    ...
Instances – process execution:                           many instances                                      Process1     ...
Linking Schema and InstancesSchema Information                                                                            ...
RDF graph of a process instanceThe elements in the process instance of the         S               P                      ...
What can I do with RDF data                 model?• An important aspect of RDF is the possibility to continuously add  inf...
Querying the RDF graph                                                                      12                            ...
What can I do with EBTIC-BPM           vocabulary?• The use of EBTIC-BPM vocabulary allows independence  between applicati...
Sample deployment: an applicationis used to capture processexecution data.A listener stores the triples in atriple store a...
A sample client application is a Process Visualizer that is albe to display domain specificprocess information just by con...
Client SPARQL Query 1SELECT ?processWHERE { ?process rdfs:subClassOf ebtic-bpm:Process.}This query will return all the con...
Client SPARQL Query 2SELECT ?processID ?startTime ?endTimeWHERE { ?processID rdf:type pa:CreateProductA.        ?processID...
Client SPARQL Query 3SELECT ?attribute ?valueWHERE { pa:01 ?attribute ?value.FILTER (?attribute != rdf:type)}This query wi...
Client SPARQL query for flexible visualizationDisplay the process instance workflow (create a GRAPHML document from the qu...
Client SPARQL query for flexible visualizationChoose an alternative representation whenever a task attribute exists (pa:cr...
Client SPARQL query for flexible visualizationChoose a task specific alternative representation (use of union).SELECT ?PID...
Flexible Visualization
Real time aspects• The queries that have been so far can be registered  in the triple store as continuous queries and the ...
Real Time Flexible Visualization
Concluding Remarks•   Presented an extremely extendible and flexible data representation model    oriented towards real ti...
Extendible data model for real-time business process analysis
Upcoming SlideShare
Loading in …5
×

Extendible data model for real-time business process analysis

1,076 views

Published on

This slides presents a promising data representation model for real time monitoring of business processes. The main benefit of this representation is that is transparent to the data creation and analysis processes and it is extensible at real-time.
The model is based on a shared vocabulary defined using RDF standard representation allowing independence between applications.
This model is a novel approach to real-time process data representation and paves the road to a complete new breed of applications for business process analysis

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,076
On SlideShare
0
From Embeds
0
Number of Embeds
395
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Extendible data model for real-time business process analysis

  1. 1. Marcello LeidaExtendible data model for real-time business process analysis IEEE-IEEM 10-13 December 2012, Honk Kong
  2. 2. Our Target: Real time process monitoring
  3. 3. Limits of actual approaches• Experience collected by deploying Business Process Mining tools in Enterprise environments (BT, Etisalat) highlighted the need for more flexible data layer and real time capabilities.• Offline analytical tools;• Rigid data model;• BPML and BPEL require big effort from the enterprise;• Tight connection between applications that capture the information and the ones that analyse it• Deal with increasingly complex systems;• Monitoring tools have to be flexible and robust enough to be able to process also information that is not present or unknown at the time of defining the data model.
  4. 4. Research Challenges for new generation of BPM tools• Flexibility: representing the process model using a formalism that allows an increased degree of flexibility. Able to address also situation where the process definition and its data is not known a priori.• Handling very large datasets: dealing with big amount of information while maintaining high performance level.• Real-Time performance: able to answer to continuous flow of events. Keep the representation simple so that it can be queried efficiently.
  5. 5. Novel representation of Process Model• Need for a less constraining and rigid model has arise;• Represent the model with a formalism that allows the degree of flexibility required;• Keep representation simple, this will allow the level of flexibility we need but on the other hand will increase complexity.
  6. 6. RDF• We use RDF to define the process model;• RDF is a standard vocabulary definition which is at the basis of the Semantic Web vision, it is composed by three elements: concepts, relations between concepts and attributes of concepts;• RDF is a data representation which is extremely extendible, flexible and publicly available;
  7. 7. RDFConcepts, relations and attributes are modelled as a labeled oriented graph, defined by a set of triples <s,p,o>s is called subject, p is called predicate and o is called object.Formally a graph G can be defined as: G ≡ (U ∪ B ∪ L) × U × (U ∪ B ∪ L)where:U is an infinite set of constant values (called URI references) these have their well-defined semantics provided as an example by the RDF and RDFS vocabularies;B is an infinite set of identifiers (called Blank nodes) which identify instantiation of concepts. Elements in this set do not have a defined semantic;L is an infinite set of values (called Literals). Elements in this set do not have a defined semantic.The elements of a triple <s,p,o> are respectively: s ∈(U ∪ B ∪ L) , p ∈ U, and o ∈ (U ∪ B ∪ L).The elements in U define the schema or vocabulary, while the elements in B and L are used to define instances.
  8. 8. An Extensible Business Process Data ModelBasic Process Model representation startTime Schema Level Process endTime • Concepts hasTask • Relations startTime Task • Attributes endTime preceededBy followedBy hasSubTask
  9. 9. RDF graph of the basic process model: EBTIC-BPM vocabularyThe elements in the conceptual model of theprevious slide are defined as a vocabulary S P OEBTIC-BPM that extends the set U which already ebtic-bpm:hasTask ebtic-bpm:hasTask rdfs:range rdfs:domain ebtic-bpm:Task ebtic-bpm:Processcontains the vocabularies RDF and RDFS. ebtic-bpm:precededBy rdfs:range ebtic-bpm:Task ebtic-bpm:precededBy rdfs:domain ebtic-bpm:Task ebtic-bpm:followedBy rdfs:range ebtic-bpm:Task ebtic-bpm:followedBy rdfs:domain ebtic-bpm:Task ebtic-bpm:hasSubtask rdfs:range ebtic-bpm:Task ebtic-bpm:hasSubtask rdfs:domain ebtic-bpm:Task ebtic-bpm:startTime rdfs:domain ebtic-bpm:Process ebtic-bpm:startTime rdfs:domain ebtic-bpm:Task ebtic-bpm:startTime rdfs:range xs:dateTime ebtic-bpm:endTime rdfs:domain ebtic-bpm:Process ebtic-bpm:endTime rdfs:domain ebtic-bpm:Task ebtic-bpm:endTime rdfs:range xs:dateTime
  10. 10. Extending the EBTIC-BPM vocabularyDomain Specific Process Model representation Schema Level startTime department Process endTime Create Product subConceptOf hasTask A• Concepts Test subConceptOf• Relations StartTime executedBy Task EndTime Assemble testedComponent• Attributes EIN subConceptOf followedBy Employee useComponent createComponent serialNumber Component
  11. 11. RDF graph of the domain specific extension vocabulary S P OThe elements in the conceptual model of the … … …previous slide are defined as a vocabulary pa:CreateProductA rdfs:subClassOf ebtic-bpm:ProcessPA that extends the set U which already contains pa:Assemble rdfs:subClassOf ebtic-bpm:Task pa:Test rdfs:subClassOf ebtic-bpm:Taskthe vocabularies RDF, RDFS and EBTIC-BPM. pa:Assemble rdfs:domain pa:useComponent pa:useComponent rdfs:range pa:Component pa:Assemble rdfs:domain pa:createComponent pa:createComponent rdfs:range pa:Component pa:Test rdfs:domain pa:testedComponent pa:testedComponent rdfs:range pa:Component pa:Test rdfs:domain pa:executedBy pa:executedBy rdfs:range pa:Employee pa:CreateProductA rdfs:domain pa:department pa:department rdfs:range xs:String pa:Component rdfs:domain pa:serialNumber pa:serialNumber rdfs:range xs:Integer pa:Employee rdfs:domain pa:EIN pa:EIN rdfs:range xs:Integer
  12. 12. Instances – process execution data: 1 process instance startTime=10:39 12/2/10 endTime=11:02 12/2/10 Process1 department=DBX hasTask hasTask startTime=10:39 12/2/10 hasTask hasTask endTime=10:42 12/2/10 startTime=10:43 12/2/10 Step1 endTime=10:48 12/2/10 startTime=10:50 12/2/10 useComponent endTime=10:54 12/2/10 startTime=10:55 12/2/10 Step2 endTime=11:02 12/2/10 Step3 Step4 followedBy useComponentserialNumber followedBy executedBy=003345 createComponent Comp 2 followedBy useComponent testedComponent Name=Mario Rossi createComponent createComponent serialNumber EIN=566568 =00445 Empl. Comp.699 32 Comp.35 serialNumber Comp.1 serialNumber=003234 =00800
  13. 13. Instances – process execution: many instances Process1 processStartTime=10:39 12/2/10 processEndTime=11:02 12/2/10 department=DBX Process3 processStartTime=10:39 13/2/10 processEndTime=11:02 13/2/10 department=DBX hasTask hasTask hasTask hasTask startTime=10:39 12/2/10 hasTask hasTask startTime=10:39 13/2/10 hasTask hasTask endTime=10:42 12/2/10 endTime=10:42 13/2/10 startTime=10:43 12/2/10 startTime=10:43 12/2/10 Step1 endTime=10:48 12/2/10 12/2/10 Step5 endTime=10:48 12/2/10 12/2/10 startTime=10:50 startTime=10:50 useComponent endTime=10:54 12/2/10 startTime=10:55 12/2/10 useComponent endTime=10:54 12/2/10 startTime=10:55 13/2/10 Step2 endTime=11:02 12/2/10 Step6 endTime=11:02 13/2/10 Step3 Step4 Step7 Step8 followedBy followedBy useComponent useComponent followedBy executedBy followedBy executedByserialN createComponent serialN createComponentumber Comp 2 umber Comp 24 useComponent followedBy useComponent useComponent followedBy useComponent=00334 Name=Mario Rossi =00444 Name=Mark Redi5 createComponentcreateComponent 5 createComponentcreateComponent EIN=566568 EIN=533568 serialN Empl. serialN Empl. umber Comp.699 umber Comp.69 32 15 =00445 Comp.35 =00435 Comp.75 Comp.1 serialNumber=003234 Comp.10 serialNumber=00334 serialN serialN umber umber =00800 =00830 processStartTime=10:39 12/2/10 processStartTime=10:39 12/2/10 processEndTime=11:02 12/2/10 processEndTime=11:02 12/2/10 Process5 department=DBX Process4 department=DBX hasTask hasTask hasTask hasTask startTime=10:39 12/2/10 hasTask hasTask startTime=10:39 12/2/10 hasTask hasTask endTime=10:42 12/2/10 endTime=10:42 12/2/10 startTime=10:43 12/2/10 startTime=10:43 12/2/10 Step10 endTime=10:48 12/2/10 12/2/10 startTime=10:50 Step67 endTime=10:48 12/2/10 12/2/10 endTime=10:54 12/2/10 startTime=10:55 12/2/10 startTime=10:50 useComponent useComponent endTime=10:54 12/2/10 startTime=10:55 12/2/10 Step20 endTime=11:02 12/2/10 Step42 endTime=11:02 12/2/10 Step30 Step40 followedBy Step23 Step123 followedBy useComponent useComponent followedBy executedBy serialN createComponent followedBy executedByserialN umber Comp 122umber createComponent useComponent followedBy useComponent Comp 9 =00334 useComponent followedBy useComponent createComponentcreateComponent John Smith=00334 Name=Mario Fettuccini 55 createComponentcreateComponent EIN=563248 serialN Empl. EIN=565568 Comp.6 serialN Empl. umber 2 umber Comp.177 =00445 36 Comp.5 =00445 serialN Comp.7 serialNumber=003234 Comp.29 Comp.13 serialNumber=003234 serialN umber umber =00800 =00800
  14. 14. Linking Schema and InstancesSchema Information Instance Information processStartTime=10:39 12/2/10 processEndTime=11:02 12/2/10 TYPE Process1 department=DBX hasTask hasTask processStartTime TYPE startTime=10:39 12/2/10 endTime=10:42 12/2/10 hasTask hasTask processEndTime serialNumber startTime=10:43 12/2/10 Process Create Product TYPE useComponent Step1 endTime=10:48 12/2/10 startTime=10:50 12/2/10 endTime=10:54 12/2/10 startTime=10:55 12/2/10 subConceptOf hasTask A TYPE endTime=11:02 12/2/10 TYPE followedBy Step2 Step3 Step4 useComponent Test followedBy executedBy subConceptOf createComponent StartTime serialN useComponent followedBy Task EndTime executedBy TYPE umber =00334 Comp 2 useComponent createComponent createComponent Name=Mario Rossi Assemble testedComponent EID TYPE 5 serialN EIN=566568 Empl. subConceptOf Comp.699 umber 32 followedBy useComponent Employee name TYPE =00445 serialN Comp.35 Comp.1 serialNumber=003234 createComponent umber =00800 serialNumber Component TYPE processStartTime=10:39 12/2/10 TYPE processEndTime=11:02 12/2/10 department=DBX Process1 hasTask hasTask startTime=10:39 12/2/10 hasTask hasTask endTime=10:42 12/2/10 startTime=10:43 12/2/10 endTime=10:48 12/2/10 12/2/10 startTime=10:50 useComponent Step1 endTime=10:54 12/2/10 startTime=10:55 12/2/10 endTime=11:02 12/2/10 Step2 followedBy Step3 Step4 useComponent followedBy executedBy createComponent serialN umber useComponent followedBy Comp 2 useComponent Name=Mario Rossi =00334 createComponent createComponent 5 EIN=566568 serialN Empl. umber Comp.699 32 =00445 Comp.35 serialNumber=003234 serialN Comp.1 umber =00800
  15. 15. RDF graph of a process instanceThe elements in the process instance of the S P Oprevious slide are defined as a set of RDF … … … Process1 rdf:type pa:CreateProduct Atriples that extends the set RDF graph defined Step1 rdf:type pa:Assemblepreviously by the vocabularies RDF, RDFS, Step2 rdf:type pa:AssembleEBTIC-BPM and PA. Step3 rdf:type pa:Assemble Step4 rdf:type pa:Test Empl32 rdf:type pa:Employee Comp2 pa:serialNumber “003345”^^xs:Integer Comp699 pa:serialNumber “00445”^^xs:Integer Comp35 pa:serialNumber “00800”^^xs:Integer Comp1 pa:serialNumber “003234”^^xs:Integer Process1 pa:department “DBX”^^xs:String Step1 ebtic-bpm:startTime “10:39 12/2/2010”^^xs:dateTime Step1 endTime “10:42 12/2/2010”^^xs:dateTime Step2 startTime “10:43 12/2/2010”^^xs:dateTime Step2 endTime “10:48 12/2/2010”^^xs:dateTime … … …
  16. 16. What can I do with RDF data model?• An important aspect of RDF is the possibility to continuously add information to the graph. This is enabled by the fact that every triple is a valid RDF piece of information that identify nodes and connections in the RDF graph. Another important feature of RDF is that both schema and instance-level information is stored in the same graph.• I can query the RDF graph!• SPARQL is the standard query language for RDF.• SPARQL query can return as result any point in the graph.
  17. 17. Querying the RDF graph 12 Assemble 15 TYPEQueries over the triples graph are conjunctive queries. 18As an example if I want to obtain all the start times of the Test 30assembling tasks, I need to identify the values ST (start time) 12 30 15that satisfied the following path in the graph: 10:48 12/2/10 18 T TYPE Assemble ∧ T startTime ST Step4 Step2 23 25 Step1 TYPE Assemble  T= Step1  Step1 startTime 10:39 12/2/2010 Step1 Step2 TYPE Assemble  T= Step2  Step2 startTime 10:42 12/2/2010 Step3 21 Step3 TYPE Assemble  T= Step3  Step3 startTime 10:48 12/2/201010:39 12/2/10 23 Step4 TYPE Test  NO PATH MATCHING! 21 25 ST={ 10:39 12/2/2010 10:42 12/2/2010 10:48 12/2/2010 } startTime 10:42 12/2/10
  18. 18. What can I do with EBTIC-BPM vocabulary?• The use of EBTIC-BPM vocabulary allows independence between applications that generate business process data and applications that consume it.• Assuming that the EBTIC-BPM vocabulary is present in the RDF graph, allows process discovery and analysis of domain specific extensions that may also be created at run time by third party applications just with the use of SPARQL queries.
  19. 19. Sample deployment: an applicationis used to capture processexecution data.A listener stores the triples in atriple store and provides a SPARQLquery interface for a clientapplication to be able to analysethe process information.
  20. 20. A sample client application is a Process Visualizer that is albe to display domain specificprocess information just by constructing SPARQL queries with only knowledge of the EBTIC-BPM vocabulary. The numbers in the boxes correspond to the queries defined in the nextslides.
  21. 21. Client SPARQL Query 1SELECT ?processWHERE { ?process rdfs:subClassOf ebtic-bpm:Process.}This query will return all the concepts extending the basic ebtic-bpm:process class.The variable ?process will contain the value pa:CreateProductA.(from the data in the previous examples)
  22. 22. Client SPARQL Query 2SELECT ?processID ?startTime ?endTimeWHERE { ?processID rdf:type pa:CreateProductA. ?processID ebtic-bpm:startTime ?startTime.OPTIONAL { ?processID ebtic-bpm:endTime ?endTime.}}This query returns information (?processID ?startTime ? endTime) about the instances of the process pa:CreateProductA.
  23. 23. Client SPARQL Query 3SELECT ?attribute ?valueWHERE { pa:01 ?attribute ?value.FILTER (?attribute != rdf:type)}This query will return all the tasks, process attributes and their values associated with a specific process instance (pa:01 in this case).
  24. 24. Client SPARQL query for flexible visualizationDisplay the process instance workflow (create a GRAPHML document from the query results)SELECT ?PID ?startTime ?endTime ?taskID ?taskType ?follBy ?precByWHERE { ?PID ebtic-bpm:hasTask ?taskID.?taskID rdf:type ?taskType.?PID ebtic-bpm:startTime ?startTime.OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.OPTIONAL { ?taskID ebtic-bpm:followedBy ?follBy.}.OPTIONAL { ?taskID ebtic-bpm:precededBy ?precBy.}.FILTER (?PID = pa:01)}
  25. 25. Client SPARQL query for flexible visualizationChoose an alternative representation whenever a task attribute exists (pa:createComponent in thiscase).SELECT ?PID ?startTime ?endTime ?taskID (7)?taskType ?followedBy ?precededBy ?alternativeNameWHERE { ?PID ebtic-bpm:hasTask ?taskID.?taskID rdf:type ?taskType.?PID ebtic-bpm:startTime ?startTime.OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.OPTIONAL { ?taskID ebtic-bpm:followedBy ?followedBy.}.OPTIONAL { ?taskID ebtic-bpm:precededBy ?precededBy.}.OPTIONAL { ?taskID pa:createComponent ?alternativeName.}.FILTER (?PID = pa:01)}
  26. 26. Client SPARQL query for flexible visualizationChoose a task specific alternative representation (use of union).SELECT ?PID ?startTime ?endTime ?taskID ?taskType ?follBy ?precBy ?altNameWHERE {{ ?PID ebtic-bpm:hasTask ?taskID.?taskID rdf:type ?taskType.?PID ebtic-bpm:startTime ?startTime.OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.OPTIONAL { ?taskID ebtic-bpm:followedBy ?follBy.}.OPTIONAL { ?taskID ebtic-bpm:precededBy ?precBy.}.OPTIONAL { ?taskID pa:createComponent ?altName.}.FILTER (?PID = pa:01 && ?taskType = pa:Assemble)}UNION{ ?PID ebtic-bpm:hasTask ?taskID.?taskID rdf:type ?taskType.?PID ebtic-bpm:startTime ?startTime.OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.OPTIONAL { ?taskID ebtic-bpm:followedBy ?follBy.}.OPTIONAL { ?taskID ebtic-bpm:precededBy ?precBy.}.OPTIONAL { ?taskID pa:executedBy ?altName.}.FILTER (?PID = pa:01 && ?taskType = pa:Test)}UNION{ ?PID ebtic-bpm:hasTask ?taskID.?taskID rdf:type ?taskType.?PID ebtic-bpm:startTime ?startTime.OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.OPTIONAL { ?taskID ebtic-bpm:followedBy ?follBy.}.OPTIONAL { ?taskID ebtic-bpm:precededBy ?precBy.}.FILTER (?PID = pa:01 && ?taskType != pa:Assemble&& ?taskType != pa:Test)}}
  27. 27. Flexible Visualization
  28. 28. Real time aspects• The queries that have been so far can be registered in the triple store as continuous queries and the application will be notified with every new result.• Assuming that the process monitor will continuously intercept process execution data and translate it into triples, the visualisation application is able to monitor the processes in real-time.
  29. 29. Real Time Flexible Visualization
  30. 30. Concluding Remarks• Presented an extremely extendible and flexible data representation model oriented towards real time business process monitoring and discovering based on RDF representation.• Demonstrated that this approach allows process discovery and analysis of domain specific extensions that may also be created at run time by third party applications just with the use of SPARQL queries.• Future work on this direction will be to develop a set of non-invasive monitoring and analytical applications that will allows us to deploy and test this approach within any enterprise-scale environment.

×