SlideShare a Scribd company logo
Erik Nooijen,
            Boudewijn v. Dongen, Dirk Fahland


Process Mining for ERP Systems
Process Discovery


                             process
             event                            process
                            discovery
              log                              model
                            algorithm



 c1: A B C D E   assumptions
 c2: A C B D E   • case = sequence of events of this case
 c3: A F D E     • cases are isolated:
                   event A in c1 happens only in c1 (and not in c2)
 …
                 • cases of the same process

                 • one unique case id,
                 • each event associated to exactly one case id



                                                              PAGE 1
Typical Process in an ERP System

                             Manufacturer



                    Material A        Material B
          order
                    Material B        Material B
        product X                                   order
Alice                                              materials
                                                                ACME Inc.




                    Material B        Material A
          order
                    Material C        Material C
        product Y                                   order
Bob
                                                   materials
                       Build to Order                           Mega Corp.

                                                               PAGE 2
n-to-m relations  database


                                                  process
                                                                        process
                                                 discovery
                                                                         model
                                                 algorithm

id attributes       time-stamp attributes                  ProductOrder                          Customer
poID    cust.   …   created        processed       built         shipped            cust.     address      …
po1     Alice       30-08 9:22     30-08 13:12     01-09 15:12   03-09 10:15        Alice     …            …

po2     Bob         30-08 10:15    30-08 13:14     01-09 16:13   03-09 17:18        Bob       …            …

      relations                                                                    data attributes
              OrderedMaterial              id attributes                                    MaterialOrder
poID    moID type added                     moID suppl.          …   completed     sent            received
po1     mo3     B    30-08 13:13            mo3      ACME            30-08 13:15   30-08 14:15     01-09 9:05
po1     mo4     A    30-08 13:14            mo4      MEGA            30-08 13:17   30-08 16:12     01-09 10:13
po2     mo3     B    30-08 13:15
po2     mo4     C    30-08 13:16                   relations
                                                                                                  PAGE 3
Process Discovery for ERP Systems


                                                          process
                                                                             process
                                                         discovery
                                                                              model
                                                         algorithm


                   0..*
                          Customer
                                                                   reality: data in a relational DB
ProductOrder              - cust
               1
                          -…                                       • events stored as time-stamped
- poID
- cust                                                               attributes in tables
- created                 OrderedMat.
                                                   MaterialOrder
- processed               - poID
- built        1
                          - moID
                                                   - moID          • multiple primary keys
- shipped                               1..*       - supplier         multiple notions of case
                          - type
                   1..*                            - completed
                          - added              1
                                                   - sent
                                                   - received      • tables are related
                                                                      one event related to
                                                                     multiple cases

                                                                                              PAGE 4
Process Discovery for ERP Systems


                                                          process
                                                                             process
                                                         discovery
                                                                              model
                                                         algorithm


                   0..*
                          Customer
                                                                   reality: data in a relational DB
ProductOrder              - cust
               1
                          -…                                       • events stored as time-stamped
- poID
- cust                                                               attributes in tables
- created                 OrderedMat.
                                                   MaterialOrder
- processed               - poID
- built        1
                          - moID
                                                   - moID          • multiple primary keys
- shipped                               1..*       - supplier         multiple notions of case
                          - type
                   1..*                            - completed
                          - added              1
                                                   - sent
                                                   - received      • tables are related
                                                                      one event related to
                                                                     multiple cases

                                                                                              PAGE 5
Outline


                                                   process
                                                    model


                                                                  related by
                                                              primary foreign-key
                                                                   relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              PO
   log f.                                                                 model f.
                         MO
    PO                                                                     MO
                                       discovery
                                                                        PAGE 6
Find Artifact Schemas


                                                   process
                                                    model


                                                                  related by
                                                              primary foreign-key
                                                                   relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              PO
   log f.                                                                 model f.
                         MO
    PO                                                                     MO
                                       discovery
                                                                        PAGE 7
Step 0: discover database schema

 document schema vs. actual schema  identify
 • column types (esp. time-stamped columns)
 • primary keys
 • foreign keys
 various (non-trivial) techniques available
 key discovery is NP-complete in the size of the
  table(s)
 result:




                                                PAGE 8
Step 1: decompose schema into processes

= schema summarization                  find:
                                        1. sets of
                                           corresponding
                                           tables
                                        2. links between
                                           those
         ProductOrder   MaterialOrder




                                                 PAGE 9
Automatic Schema Summarization

= group similar tables
  through clustering
 define a distance between
    any 2 tables
    •     by relations
    •     by information content


       tables that are close to
        each other
         same cluster
       # of clusters: user input



                                    PAGE 10
Automatic Schema Summarization


1. structural distance                     A
   between tables                          1
                                           2         fanout: 1 = (2+0)/2
   fanout ~ avg. # of child   fanout: 1
   records related to the                      fanout: 2
   same parent record
                              A B         A B              A B
                              1 X         1 X              1 X
                              2 Y         1 Y              1 Y
                                          2 Z
                                          2 U




                                                           PAGE 11
Automatic Schema Summarization


1. structural distance                        A
   between tables                             1
                                              2          fanout: 1
   fanout ~ avg. # of child      fanout: 1                 m.fr: 2 = 1/ (1/2)
   records related to the        m.fr: 1          fanout: 2
   same parent record                             m.fr: 1
                                 A B         A B              A B
   matched fraction ~            1 X         1 X              1 X
   1 / (fraction of records in   2 Y         1 Y              1 Y
   parent with matching child                2 Z
   record)                                   2 U




                                                                PAGE 12
Grouping by Clustering

1. structural distance
2. information distance
   importance of each table
   = entropy (is maximal if all
   records are different)
   distance: 2 tables with high
   entropies  large distance
3. weighted distance by
   structure + information
4. k-means clustering:            most important table of cluster
   k clusters based on            = table with least distance to all
                                   key attribute of the cluster
   weighted distance
                                                            PAGE 13
Artifact Schema  Artifact Log


                                                   process
                                                    model


                                                                  related by
                                                              primary foreign-key
                                                                   relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              PO
   log f.                                                                 model f.
                         MO
    PO                                                                     MO
                                       discovery
                                                                        PAGE 14
Log Extraction

                  cluster = set of related tables
                            + primary key of most important table

                                         case id




                poID   cust.   …   created       processed     built          shipped
       log f.
        PO      po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                      poID     moID type added
                                                      po1      mo3     B      30-08 13:13
po1:                                                  po1      mo4     A      30-08 13:14
                                                      po2      mo3     B      30-08 13:15

po2:                                                  po2      mo4     C      30-08 13:16

                                                                             PAGE 15
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                 case id

                           time-stamped attribute  event


                        poID   cust.   …   created       processed     built          shipped
          log f.
           PO           po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                        po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                              poID     moID type added
                                                              po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, …)                  po1      mo4     A      30-08 13:14
                                                              po2      mo3     B      30-08 13:15
                                                              po2      mo4     C      30-08 13:16

                                                                                     PAGE 16
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                  case id

                           time-stamped attribute  event
                           related attributes  event attributes
                         poID   cust.   …   created       processed     built          shipped
           log f.
            PO           po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                         po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                               poID     moID type added
                                                               po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1            mo4     A      30-08 13:14
                                                               po2      mo3     B      30-08 13:15
                                                               po2      mo4     C      30-08 13:16

                                                                                      PAGE 17
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                  case id

                           time-stamped attribute  event
                           related attributes  event attributes
                         poID   cust.   …   created       processed     built          shipped
           log f.
            PO           po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                         po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                               poID     moID type added
                                                               po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1            mo4     A      30-08 13:14
    (processed, poID=po1, time=30-08 13:12, …)                 po2      mo3     B      30-08 13:15
                                                               po2      mo4     C      30-08 13:16

                                                                                      PAGE 18
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                    case id

                           time-stamped attribute  event
                           related attributes  event attributes
                         poID     cust.   …   created       processed     built          shipped
           log f.
            PO           po1      Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                         po2      Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                                 poID     moID type added
                                                                 po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1              mo4     A      30-08 13:14
    (processed, poID=po1, time=30-08 13:12, …)                   po2      mo3     B      30-08 13:15
    (added, poID=po1, time=30-08 13:13, moID=mo3, …)po2                   mo4     C      30-08 13:16
                               refers to artifact “MaterialOrder”
                                                                                        PAGE 19
Outline


                                                   process
                                                    model


                                                                  compose by
                                                              primary foreign-key
                                                                    relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              order
   log f.                                                                 model f.
                        order
   quote                                                                   quote
                                       discovery
                                                                        PAGE 20
Resulting Model(s)
                Product Order                         Material Order
                                       1..*
                                              added
       create

                                                       completed

      processed

                    added       1..*                      sent

        built

                                                        received

       shipped


                        (addded, poID=po1, …, moID=mo3)
                                                                   PAGE 21
Implementation & Evaluation

 prototype tool
 • input: relational database (via JDBC), .csv tables
 • steps
   − discover database schema (types, keys, relations)
   − discover artifact schema
     − by k-means clustering
     − by user picking tables
   − extract logs  ProM




                                                     PAGE 22
Evaluation: SAP System of Sligro

 > 300 tables, > 40 GiB of data
 schema extraction time-stamp attributes: 15 hrs
                       primary keys:          4 hrs
                       foreign keys:          5 hrs (single col)/
                                              6 days (double col.)

 clustering           entropies:               17 hrs
                       table distances:         5 hrs
                       clustering:              a few seconds
                       ~20 different artifacts found
                       largest: 47 tables, 869 columns

 log extraction       extract 1000 traces of > 246,000 events
                       query database:          1 hrs
                       write log file:          32 hrs

                                                             PAGE 23
Sligro: Artikel lifecycle model




                                  PAGE 24
Open issues

 performance
 •   key discovery: NP-complete in R (# of records)
 •   foreign key discovery: NP-complete in R2
 •   problem is in the “hard part” of NP
 •    sampling of data, domain knowledge, semi-automatic
 requires good database structure
 •   proper relations, proper keys
 •   otherwise wrong clusters are formed
 •   events don’t get right attributes
 •    semi-automatic approach
 events shared by multiple cases… working on it…
                                                    PAGE 25
Erik Nooijen,
            Boudewijn v. Dongen, Dirk Fahland


Process Mining for ERP Systems

More Related Content

More from Dirk Fahland

Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis
Dirk Fahland
 
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Dirk Fahland
 
Describing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional ProcessesDescribing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional Processes
Dirk Fahland
 
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Dirk Fahland
 
Where did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process modelsWhere did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process models
Dirk Fahland
 
Mining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsMining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution Logs
Dirk Fahland
 
From Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed ComponentsFrom Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed Components
Dirk Fahland
 
LSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsLSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed Components
Dirk Fahland
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match Reality
Dirk Fahland
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process Models
Dirk Fahland
 
The Process of Process Modeling
The Process of Process ModelingThe Process of Process Modeling
The Process of Process Modeling
Dirk Fahland
 
Behavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process ModelsBehavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process Models
Dirk Fahland
 
Many-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric ChoreographiesMany-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric Choreographies
Dirk Fahland
 
Artifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple InstancesArtifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple Instances
Dirk Fahland
 

More from Dirk Fahland (14)

Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis
 
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
 
Describing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional ProcessesDescribing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional Processes
 
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
 
Where did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process modelsWhere did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process models
 
Mining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsMining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution Logs
 
From Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed ComponentsFrom Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed Components
 
LSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsLSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed Components
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match Reality
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process Models
 
The Process of Process Modeling
The Process of Process ModelingThe Process of Process Modeling
The Process of Process Modeling
 
Behavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process ModelsBehavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process Models
 
Many-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric ChoreographiesMany-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric Choreographies
 
Artifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple InstancesArtifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple Instances
 

Recently uploaded

Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

Process Mining for ERP Systems

  • 1. Erik Nooijen, Boudewijn v. Dongen, Dirk Fahland Process Mining for ERP Systems
  • 2. Process Discovery process event process discovery log model algorithm c1: A B C D E assumptions c2: A C B D E • case = sequence of events of this case c3: A F D E • cases are isolated: event A in c1 happens only in c1 (and not in c2) … • cases of the same process • one unique case id, • each event associated to exactly one case id PAGE 1
  • 3. Typical Process in an ERP System Manufacturer Material A Material B order Material B Material B product X order Alice materials ACME Inc. Material B Material A order Material C Material C product Y order Bob materials Build to Order Mega Corp. PAGE 2
  • 4. n-to-m relations  database process process discovery model algorithm id attributes time-stamp attributes ProductOrder Customer poID cust. … created processed built shipped cust. address … po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 Alice … … po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 Bob … … relations data attributes OrderedMaterial id attributes MaterialOrder poID moID type added moID suppl. … completed sent received po1 mo3 B 30-08 13:13 mo3 ACME 30-08 13:15 30-08 14:15 01-09 9:05 po1 mo4 A 30-08 13:14 mo4 MEGA 30-08 13:17 30-08 16:12 01-09 10:13 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 relations PAGE 3
  • 5. Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DB ProductOrder - cust 1 -… • events stored as time-stamped - poID - cust attributes in tables - created OrderedMat. MaterialOrder - processed - poID - built 1 - moID - moID • multiple primary keys - shipped 1..* - supplier  multiple notions of case - type 1..* - completed - added 1 - sent - received • tables are related  one event related to multiple cases PAGE 4
  • 6. Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DB ProductOrder - cust 1 -… • events stored as time-stamped - poID - cust attributes in tables - created OrderedMat. MaterialOrder - processed - poID - built 1 - moID - moID • multiple primary keys - shipped 1..* - supplier  multiple notions of case - type 1..* - completed - added 1 - sent - received • tables are related  one event related to multiple cases PAGE 5
  • 7. Outline process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 6
  • 8. Find Artifact Schemas process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 7
  • 9. Step 0: discover database schema  document schema vs. actual schema  identify • column types (esp. time-stamped columns) • primary keys • foreign keys  various (non-trivial) techniques available  key discovery is NP-complete in the size of the table(s)  result: PAGE 8
  • 10. Step 1: decompose schema into processes = schema summarization find: 1. sets of corresponding tables 2. links between those ProductOrder MaterialOrder PAGE 9
  • 11. Automatic Schema Summarization = group similar tables through clustering  define a distance between any 2 tables • by relations • by information content  tables that are close to each other  same cluster  # of clusters: user input PAGE 10
  • 12. Automatic Schema Summarization 1. structural distance A between tables 1 2 fanout: 1 = (2+0)/2 fanout ~ avg. # of child fanout: 1 records related to the fanout: 2 same parent record A B A B A B 1 X 1 X 1 X 2 Y 1 Y 1 Y 2 Z 2 U PAGE 11
  • 13. Automatic Schema Summarization 1. structural distance A between tables 1 2 fanout: 1 fanout ~ avg. # of child fanout: 1 m.fr: 2 = 1/ (1/2) records related to the m.fr: 1 fanout: 2 same parent record m.fr: 1 A B A B A B matched fraction ~ 1 X 1 X 1 X 1 / (fraction of records in 2 Y 1 Y 1 Y parent with matching child 2 Z record) 2 U PAGE 12
  • 14. Grouping by Clustering 1. structural distance 2. information distance importance of each table = entropy (is maximal if all records are different) distance: 2 tables with high entropies  large distance 3. weighted distance by structure + information 4. k-means clustering: most important table of cluster k clusters based on = table with least distance to all  key attribute of the cluster weighted distance PAGE 13
  • 15. Artifact Schema  Artifact Log process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 14
  • 16. Log Extraction cluster = set of related tables + primary key of most important table case id poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2: po2 mo4 C 30-08 13:16 PAGE 15
  • 17. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, …) po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 16
  • 18. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 17
  • 19. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 18
  • 20. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 (added, poID=po1, time=30-08 13:13, moID=mo3, …)po2 mo4 C 30-08 13:16 refers to artifact “MaterialOrder” PAGE 19
  • 21. Outline process model compose by primary foreign-key relations decompose by primary keys model f. log f. discovery order log f. model f. order quote quote discovery PAGE 20
  • 22. Resulting Model(s) Product Order Material Order 1..* added create completed processed added 1..* sent built received shipped (addded, poID=po1, …, moID=mo3) PAGE 21
  • 23. Implementation & Evaluation  prototype tool • input: relational database (via JDBC), .csv tables • steps − discover database schema (types, keys, relations) − discover artifact schema − by k-means clustering − by user picking tables − extract logs  ProM PAGE 22
  • 24. Evaluation: SAP System of Sligro  > 300 tables, > 40 GiB of data  schema extraction time-stamp attributes: 15 hrs primary keys: 4 hrs foreign keys: 5 hrs (single col)/ 6 days (double col.)  clustering entropies: 17 hrs table distances: 5 hrs clustering: a few seconds ~20 different artifacts found largest: 47 tables, 869 columns  log extraction extract 1000 traces of > 246,000 events query database: 1 hrs write log file: 32 hrs PAGE 23
  • 25. Sligro: Artikel lifecycle model PAGE 24
  • 26. Open issues  performance • key discovery: NP-complete in R (# of records) • foreign key discovery: NP-complete in R2 • problem is in the “hard part” of NP •  sampling of data, domain knowledge, semi-automatic  requires good database structure • proper relations, proper keys • otherwise wrong clusters are formed • events don’t get right attributes •  semi-automatic approach  events shared by multiple cases… working on it… PAGE 25
  • 27. Erik Nooijen, Boudewijn v. Dongen, Dirk Fahland Process Mining for ERP Systems