Applications of process mining to
robotic process automation: Robotic
Process Mining
Marlon Dumas
University of Tartu and Apromore
Process Mining Summer School, Aachen, 4-8 July 2022
Research Funded by the European Research Council (PIX project) and the Australian Research Council
With Volodymyr Leno, Marcello La Rosa, Artem Polyvyanyy, and Fabrizio Maggi
ETL
process model
comparative variant
analysis reports
conformance
reports
process performance
measurements
data-driven simulations
process predictions
Enterprise System
Scope of Process Mining
Business Process Event Log
Processes, Tasks, User Interactions
Task1:
Check Application
Loan origination process instance
Task2:
Check Background
Task3:
AssessApplication
Task4:
Underwrite
Task5:
ApproveOffer
Adobe
Reader
Sales-
Force
Private
User
action
User
action
Outlook Outlook Private
Private
User
action
SAP
Worker’sinteractions with IT systems
Task3:
AssessApplication
Adobe
Reader
Sales-
Force
Outlook Outlook
Lending
Backend
Loan ID Loan ID Loan ID Loan ID Loan ID
Woker ID Worker ID Woker ID Woker ID …
UI Log (click-level, worker ID not shown)
No Case ID(s) – End of the world?
• TomakeUI logsuseful, weneed to:
• Segment them–groupevents intotaskinstances
• Ideally also:link taskinstancestoprocessinstances(caseIDs)
• Segmentationapproaches:
• Delimiter-based segmentation:If weknowthatevery taskinstance(ofa tasktype)startswhen auser
clicksbutton“CreatePO”andfinisheswitha click ona “Submit”button,wecansegment ateach
occurrenceoftheseevents.
• Resource-time windowing:If anevent log tells us thatWorker123 performedtaskinstancet0023
between 2022-07-0810:15-10:20,thenall events ofthisworkerduringthis timewindowarelinked to
t0023
• In resource-timewindowing,thelink UI event totaskinstancegives us alsoacaseID(e.g. ifanevent is linked
tot0023,then it is linkedtoitscaseID)
• Otherwise,we cansometimesfinda caseID bylooking atthepayloadoftheUI events (e.g. the LoanID
appearsasa textfield in someUIevents)
Analyzing Task UI Logs withProcess Mining
• Task Mining probes are
deployed on user workstations to
gather user interaction (UI) logs
on performed tasks using
screenshot + image processing
OR native GUI libraries.
• Raw UI data pushed to a Data
Processing Server.
• The Task Mining configuration module
allows analysts to provide input for data
processing, e.g.
• defining task boundaries for
segmentation
• specifying the granularity (screen-
level of task-level)
• tagging sensitive information
• …
• The Task Mining data processing
engine pre-processes the raw UI data
using the configuration.
Processed UI logs are fed into the
process mining tool.
From here, this data can be used
to discover the underlying routines
inside each task, analyze
performance and compliance at the
sub-task level, analyze worker
performance, etc.
Raw UI
data
Processed
UI data Process MiningTool
Data Processing Server
UserWorkstations
Task discovery from a click-levelUI log
Task discovery from a screen-level UI log
Task miningvs Process mining
Process Mining Task Mining
Scope Full end-to-end processes Individual tasks and how they are done
Objective Optimizing against process performance
indicators
Optimizing the performance of individual
tasks
Source of data
Event logs generated by enterprise
systems, e.g. SAP, Salesforce,
ServiceNow…
User interaction logs obtainer by recording
worker activity via desktop or Web
applications, e.g. MS Outlook or Adobe
Reader
Correlation
Single case-id or inter-related case IDs
(e.g. Loan Application ID, PO ID, Invoice
ID)
No direct link to a “case”; instead reference
to recorded worker
Task Mining:Use Cases
(accordingtoGartner)
Task Automation
(Robotic Process
Mining)
• Unknownrootcausesofprocessworkaroundsand
deviations
• Task workflownon-compliance
• Notaskvisibility:mosteffectivepathsmay be
overlooked
• Tasksamenabletoautomation(e.g. viaRPA) arehardto
identify
• Slowautomationdevelopment
• InaccurateROIassessment(notgroundedonrealdata)
• Lowworkforceproductivity
• IncorrectlysetKPIs,no benchmark
• Highvarianceintask executions
• Bestpracticesnotknown
• Poortraining&knowledge base
• Bad employeeexperience
Workforce
Optimization
Task
Improvement
Common
pain
points
Marc Kerremans and Tushar Srivastava. Discover the differences and use cases of process mining
versus task mining. Research Note G00723821, Gartner, April 2020.
3
Robotic Process Automation – emerging technology that allows organizations to automate repetitive
clerical work by executing scripts (RPA bots) that encode sequences of fine-grained interactions with Web
and desktop applications
From: https://www.reliableplant.com/Read/31352/human-robot-collaboration
http://www.cirriusimpact.com/robotic-process-automation-rpa/
Attended automation Unattended automation
Robotic Process Automaton (RPA)
3
Automatable TaskExample
1
3
 Error rates reduction
 Cycle time reduction
 Flow standardization (consistency)
 Cost efficiency
Why Robotic Process Automation?
1
From Adobe Stock
3
Classical RPA Analysisand Development
Interaction
Information
System
Event Log
Process Mining
Discovery
Conformance
Enhancement
Process Model
Information
systems
Users
(employees)
RPA script (bot)
Routine
Analysis
Development
Interviews Workshops Observation
− Time-consuming
− Error-prone
− Difficult to maintain
RPA with RoboticProcess Mining
Interaction
Information
System
Event Log
Process Mining
Discovery
Conformance
Enhancement
Process Model
Information
systems
Users
(employees)
RPA script (bot)
Routine
Analysis
UI log
Recording
Automated Discovery
Compilation
Synthesis
Routine specification
Development
 Shortened time-frames
 Data-driven
 Objective
Robotic Process Mining
Interaction
Information
systems
Users
(employees)
RPA script (bot)
Routine
Analysis
UI log
Generation
Identification
Compilation
Discovery
Routine specification
Robotic Process Mining
Development
1. Given a user interaction log, how to identify routines that can be
potentially automated via an RPA tool? How to reliably assess the
“automatability” of a routine?
2. Given a set of (automatable) routines, how to prioritize these
routines to maximize the benefits/ROI of RPA investments?
3. Given a routine, how to discover an executable specification that can
be executed by an RPA bot?
4. Given a collection of operative RPA bots, how to monitor their
performance and assess the realized benefits of an RPA initiative?
How to adjust and evolve RPA bots for maximum benefit realization?
Robotic Process Mining:Research Questions
RoboticProcess MiningPipeline
Recording Segmentation Task Traces
Simplified
Task Traces
Candidate
Routines
Routine
Specifications
Non-redundant
Routine
Specifications
RPA Script
Simplification
Candidate
routines
identification
Executable
(sub)routines
discovery
Aggregation Compilation
UI log
1
UI Log
Preprocessing and
normalization
Control-flow graph
construction
Back Edges
detection
Segments
identification
Candidate
selection
Candidates
discovery
Candidate
routines
Segmentation
Routines
identification
Identificationof CandidateRoutines
 Preprocessing  Normalization
UI parameters
Data
parameters
• Copied content
• Cell value
• Field value
Context
parameters
• Field name
• Button label
• Spreadsheet
Unique value for each trace Same value for all traces
Preprocessing and
normalization
Control-flow graph
construction
Back Edges
detection
Segments
identification
Routines
identification
Entry point
Normalized UI
Directly-follows
relation
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
Dominator tree
Strongly
connected
component
(SCC)
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
Dominator tree
Strongly
connected
component
(SCC)
Head
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
Dominator tree
Strongly
connected
component
(SCC)
Back-edge
Head
Target nodes
Source nodes
Segment 1
Segment 2
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
< , Uy, U2, U3, Ux, , Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern1: {U1, U2, U3, U4}
< , Uy, U2, U3, Ux, , Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern2: {U1, Ux, U4}
< , Uy, U2, U3, Ux, , Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern3: {U1, Uy, U2, U3, U4}
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
One UI can only belong to one routine!
U1 U4 U1 U4 U1 U4
A routine is a frequent (gapped) sequence of user interactions.
Good candidates for automation are time-consuming routines with a large number of executions and
error-prone manual labor.
<Uy, Ux, Uz>
<Uy, Ux, Uz>
<Ux, Uz>
<U1, Uy, U2, U3, Ux, U4, Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern1: {U1, U2, U3, U4}
Solution:
 Discover frequent patterns as usual
 Rank them accordingly to a certain metric (e.g., length, frequency, coverage)
 Select the best pattern and remove its occurrences from the segments
 Repeat the procedure until no more frequent patterns left
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
Pattern2: {Ux, Uz}
<Uy>
<Uy>
<>
Synthetic logs
Supervised recording
2 The last four logs were obtained by combining SR
Evaluationresults.Segmentation
* The second log describes more complex and unstructured behavior
 Scholarship allocation process in the University of Melbourne
 2 workers
Log # Discovered
segments
# Identified
routines
# Routine
variants
Execution
time (sec.)
Scholarship1 35 2 5 41.686
Scholarship2* 3 0 0 426.319
Evaluationresults.Unsupervised recording logs
A routine is automatable if every UI in the routine can
be deterministically executed based on input data, or
data produced by previous UIs.
Routine specification is a representation of an
automatable routine that can be executed by an RPA
tool.
Executableroutines discovery & synthesis
Routine Automatability Index (RAI) is the degree of
the automatability of a routine. Computed as a ratio
of the automatable UIs within the routine.
Example
21
Data transformation
“+61 043 512 4834” “043-512-4834”
SOURCE TARGET
3
Synthesis of routinespecifications as a transformationproblem
Key ideas:
 Synthesize one transformation per output field and use UI log to discover input-to-output data-flows
 Discover patterns in the input values and discover one transformation per input pattern
Discovering data transformationsby example
For each routine instance:
 Collect last edits of all target application elements
 Identify corresponding sources and their values
 Create input-output transformation examples (Input, Output, Source, Target)
Examplesextraction:Overview
Last edit Target Output
Corresponding read Source Input
t = (
Input = “+61 043 512 4834”,
Output = “043-512-4834”,
Source = “D3”,
Target = “Phone”
)
3
+61 (039) 689 9324
+61 (039) 689-9324
+61 039 689-9324
61.039.689.9324
+61 039 689 9324
039-689-9324
039.689.9324
039-689-9324
No single data transformation program
Identify patterns by applying tokenization
Group transformation examples with the
same pattern together
Discover transformation program for each group
Solution
Examplesextraction.Heterogeneous data
3
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<d>+
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<a>+
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
Special characters
(remain unchanged)
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<d>+ <a>+ <a>+, <a>+ <a>+, <a>+ <d>+, <a>+
Example
Examplesextraction.Tokenization
3
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
039 689 9324
035 341 2938
079 149 3015
split_first(0, ‘ ‘)
split(0, ‘ ‘)
drop(0, ‘ ‘)
drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘)
Input Output
 Program synthesis as a search problem;
 Heuristic search based on A* algorithm;
 Cost function is based on the number of data manipulations;
 Deals with string and table manipulations.
 Implemented in the Foofah toolset
Transformationdiscovery. Syntactictransformations
3
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
split_first(0, ‘ ‘)
split(0, ‘ ‘)
drop(0, ‘ ‘)
drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘)
Input Output
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
039 689 9324
035 341 2938
079 149 3015
Transformationdiscovery. Syntactictransformations
 Program synthesis as a search problem;
 Heuristic search based on A* algorithm;
 Cost function is based on the number of data manipulations;
 Deals with string and table manipulations.
 Implemented in the Foofah toolset
3
 Searching for functional dependencies;
 Transformations in the form of substitution mapping schemes
Transformationdiscovery. Semanticaltransformations
Transformationdiscovery. Routine specification
Click Button “New Record”
Click Button “Submit”
Routine effects
Routines aggregation
Synthetic logs Supervised recording Unsupervised recording
Evaluationresults
Candidate routines discovery
 Discovering routines in the presence of multi-tasking and/or frequent worker
distractions (the routine occurrences may overlap)
 Discovering routines that are often performed in a piece-wise manner?
Executable routines discovery
 Discovering automatable routines where the data transfer between fields is NOT
explicitly recorded in the UI log ( “copy typing”)?
 How to discover automatable routines with complex conditional behavior?
 How to discover semi-automatable routine specifications (for unattended RPA)?
Strategic alignment, governance, people & culture:
 Monitoring of RPA performance & acceptance
 Compliance verification & monitoring.
Open Challenges
UI Log Recording
 V. Leno, A. Polyvyanyy, M. La Rosa, M. Dumas, & F. M. Maggi. Action Logger: Enabling Process Mining for Robotic
Process Automation. In BPM Demonstration Track, 2019, pp. 124-128.
UI Log Segmentation
 J. Shen, L. Li, and T. G. Dietterich, Real-time detection of task switches of desktop users. IJCAI 2007, pp. 2868–2873.
 Bosco, A. Augusto, M. Dumas, M. La Rosa, and G. Fortino, “Discovering automatable routines from user interaction
logs,” in BPM Forum’2019. Springer.
 G. Tello, G. Gianini, R. Mizouni, and E. Damiani, “Machine learning-based framework for log-lifting in business process
mining applications,” in BPM’2019, Springer.
 Simone Agostinelli, Francesco Leotta, Andrea Marrella: Interactive Segmentation of User Interface Logs. ICSOC 2021:
65-80
References
36
Candidate Routine Identification
 A. Jimenez-Ramirez, H. A. Reijers, I. Barba, and C. Del Valle, “A method to improve the early stages of the robotic
process automation lifecycle,” in CAiSE’2019, Springer, pp. 446–461
 D. Choi, H. R’bigui, and C. Cho, “Candidate digital tasks selection methodology for automation with robotic process
automation,” Sustainability 13(16):8980, 2021.
 V. Leno, A. Augusto, M. Dumas, M. La Rosa, F. M. Maggi, & A. Polyvyanyy. Identifying candidate routines for Robotic
Process Automation from unsegmented UI logs. In ICPM’2020, pp. 153-160, IEEE.
 J. Gao, S. J. van Zelst, X. Lu, W.M.P. van der Aalst: Automated Robotic Process Automation: A Self-Learning Approach.
OTM Conferences 2019, Springer, pp. 95-112
Synthesis of Executable Routine Specifications
 V. Leno, M. Dumas, M. La Rosa, F. M. Maggi, & A. Polyvyanyy (2020). Automated Discovery of Data Transformations
for Robotic Process Automation. AAAI Workshop on Intelligent Process Automation (IPA), 2020.
 S. Agostinelli, M. Lupia, A. Marrella, M. Mecella: Automated Generation of Executable RPA Scripts from User Interface
Logs. BPM Blockchain and RPA Forum 2020, Springer, pp. 116-131.
 R. Dong, Z. Huang, I. Iong Lam, Y. Chen, X. Wang . WebRobot: Web Robotic Process Automation using Interactive
Programming-by-Demonstration. PLDI’2022.
References
36
End-to-End Robotic Process Mining
 V. Leno, A. Polyvyanyy, M. Dumas, M. La Rosa, & F. M. Maggi (2020). Robotic Process Mining: Vision and Challenges.
Business and Information Systems Engineering, pp. 1-14, Springer.
 V. Leno, S. Deviatykh, A. Polyvyanyy, M. La Rosa, M. Dumas, & F. M. Maggi. Robidium: Automated Synthesis of Robotic
Process Automation Scripts from UI logs. In BPM Demonstration Track, 2020, pp. 102-106.
 Simone Agostinelli, Marco Lupia, Andrea Marrella, Massimo Mecella: SmartRPA: A Tool to Reactively Synthesize
Software Robots from User Interface Logs. CAiSE Forum 2021, Springer, pp. 137-145.
 V. Leno, A. Augusto, M. Dumas, M. La Rosa, F. M. Maggi & A. Polyvyany. Discovering executable routine specifications
from user interaction logs. Information Systems 107: 101916, 2022.
References
36

Robotic Process Mining

  • 1.
    Applications of processmining to robotic process automation: Robotic Process Mining Marlon Dumas University of Tartu and Apromore Process Mining Summer School, Aachen, 4-8 July 2022 Research Funded by the European Research Council (PIX project) and the Australian Research Council With Volodymyr Leno, Marcello La Rosa, Artem Polyvyanyy, and Fabrizio Maggi
  • 2.
    ETL process model comparative variant analysisreports conformance reports process performance measurements data-driven simulations process predictions Enterprise System Scope of Process Mining
  • 3.
  • 4.
    Processes, Tasks, UserInteractions Task1: Check Application Loan origination process instance Task2: Check Background Task3: AssessApplication Task4: Underwrite Task5: ApproveOffer Adobe Reader Sales- Force Private User action User action Outlook Outlook Private Private User action SAP Worker’sinteractions with IT systems Task3: AssessApplication Adobe Reader Sales- Force Outlook Outlook Lending Backend Loan ID Loan ID Loan ID Loan ID Loan ID Woker ID Worker ID Woker ID Woker ID …
  • 5.
    UI Log (click-level,worker ID not shown)
  • 6.
    No Case ID(s)– End of the world? • TomakeUI logsuseful, weneed to: • Segment them–groupevents intotaskinstances • Ideally also:link taskinstancestoprocessinstances(caseIDs) • Segmentationapproaches: • Delimiter-based segmentation:If weknowthatevery taskinstance(ofa tasktype)startswhen auser clicksbutton“CreatePO”andfinisheswitha click ona “Submit”button,wecansegment ateach occurrenceoftheseevents. • Resource-time windowing:If anevent log tells us thatWorker123 performedtaskinstancet0023 between 2022-07-0810:15-10:20,thenall events ofthisworkerduringthis timewindowarelinked to t0023 • In resource-timewindowing,thelink UI event totaskinstancegives us alsoacaseID(e.g. ifanevent is linked tot0023,then it is linkedtoitscaseID) • Otherwise,we cansometimesfinda caseID bylooking atthepayloadoftheUI events (e.g. the LoanID appearsasa textfield in someUIevents)
  • 7.
    Analyzing Task UILogs withProcess Mining • Task Mining probes are deployed on user workstations to gather user interaction (UI) logs on performed tasks using screenshot + image processing OR native GUI libraries. • Raw UI data pushed to a Data Processing Server. • The Task Mining configuration module allows analysts to provide input for data processing, e.g. • defining task boundaries for segmentation • specifying the granularity (screen- level of task-level) • tagging sensitive information • … • The Task Mining data processing engine pre-processes the raw UI data using the configuration. Processed UI logs are fed into the process mining tool. From here, this data can be used to discover the underlying routines inside each task, analyze performance and compliance at the sub-task level, analyze worker performance, etc. Raw UI data Processed UI data Process MiningTool Data Processing Server UserWorkstations
  • 8.
    Task discovery froma click-levelUI log
  • 9.
    Task discovery froma screen-level UI log
  • 10.
    Task miningvs Processmining Process Mining Task Mining Scope Full end-to-end processes Individual tasks and how they are done Objective Optimizing against process performance indicators Optimizing the performance of individual tasks Source of data Event logs generated by enterprise systems, e.g. SAP, Salesforce, ServiceNow… User interaction logs obtainer by recording worker activity via desktop or Web applications, e.g. MS Outlook or Adobe Reader Correlation Single case-id or inter-related case IDs (e.g. Loan Application ID, PO ID, Invoice ID) No direct link to a “case”; instead reference to recorded worker
  • 11.
    Task Mining:Use Cases (accordingtoGartner) TaskAutomation (Robotic Process Mining) • Unknownrootcausesofprocessworkaroundsand deviations • Task workflownon-compliance • Notaskvisibility:mosteffectivepathsmay be overlooked • Tasksamenabletoautomation(e.g. viaRPA) arehardto identify • Slowautomationdevelopment • InaccurateROIassessment(notgroundedonrealdata) • Lowworkforceproductivity • IncorrectlysetKPIs,no benchmark • Highvarianceintask executions • Bestpracticesnotknown • Poortraining&knowledge base • Bad employeeexperience Workforce Optimization Task Improvement Common pain points Marc Kerremans and Tushar Srivastava. Discover the differences and use cases of process mining versus task mining. Research Note G00723821, Gartner, April 2020.
  • 12.
    3 Robotic Process Automation– emerging technology that allows organizations to automate repetitive clerical work by executing scripts (RPA bots) that encode sequences of fine-grained interactions with Web and desktop applications From: https://www.reliableplant.com/Read/31352/human-robot-collaboration http://www.cirriusimpact.com/robotic-process-automation-rpa/ Attended automation Unattended automation Robotic Process Automaton (RPA)
  • 13.
  • 14.
    3  Error ratesreduction  Cycle time reduction  Flow standardization (consistency)  Cost efficiency Why Robotic Process Automation? 1 From Adobe Stock
  • 15.
    3 Classical RPA AnalysisandDevelopment Interaction Information System Event Log Process Mining Discovery Conformance Enhancement Process Model Information systems Users (employees) RPA script (bot) Routine Analysis Development Interviews Workshops Observation − Time-consuming − Error-prone − Difficult to maintain
  • 16.
    RPA with RoboticProcessMining Interaction Information System Event Log Process Mining Discovery Conformance Enhancement Process Model Information systems Users (employees) RPA script (bot) Routine Analysis UI log Recording Automated Discovery Compilation Synthesis Routine specification Development  Shortened time-frames  Data-driven  Objective
  • 17.
    Robotic Process Mining Interaction Information systems Users (employees) RPAscript (bot) Routine Analysis UI log Generation Identification Compilation Discovery Routine specification Robotic Process Mining Development
  • 18.
    1. Given auser interaction log, how to identify routines that can be potentially automated via an RPA tool? How to reliably assess the “automatability” of a routine? 2. Given a set of (automatable) routines, how to prioritize these routines to maximize the benefits/ROI of RPA investments? 3. Given a routine, how to discover an executable specification that can be executed by an RPA bot? 4. Given a collection of operative RPA bots, how to monitor their performance and assess the realized benefits of an RPA initiative? How to adjust and evolve RPA bots for maximum benefit realization? Robotic Process Mining:Research Questions
  • 19.
    RoboticProcess MiningPipeline Recording SegmentationTask Traces Simplified Task Traces Candidate Routines Routine Specifications Non-redundant Routine Specifications RPA Script Simplification Candidate routines identification Executable (sub)routines discovery Aggregation Compilation UI log 1
  • 20.
    UI Log Preprocessing and normalization Control-flowgraph construction Back Edges detection Segments identification Candidate selection Candidates discovery Candidate routines Segmentation Routines identification Identificationof CandidateRoutines
  • 21.
     Preprocessing Normalization UI parameters Data parameters • Copied content • Cell value • Field value Context parameters • Field name • Button label • Spreadsheet Unique value for each trace Same value for all traces Preprocessing and normalization Control-flow graph construction Back Edges detection Segments identification Routines identification
  • 22.
    Entry point Normalized UI Directly-follows relation Preprocessingand normalization Control-flow graph construction Segments identification Routines identification Back Edges detection
  • 23.
  • 24.
  • 25.
    Preprocessing and normalization Control-flow graph construction Segments identification Routines identification BackEdges detection Dominator tree Strongly connected component (SCC) Back-edge Head
  • 26.
    Target nodes Source nodes Segment1 Segment 2 Preprocessing and normalization Control-flow graph construction Segments identification Routines identification Back Edges detection
  • 27.
    < , Uy,U2, U3, Ux, , Uz> <U1, Uy, U2, Ux, U3, Uz, U4> <U1, Ux, Uz, U2, U3, U4> Pattern1: {U1, U2, U3, U4} < , Uy, U2, U3, Ux, , Uz> <U1, Uy, U2, Ux, U3, Uz, U4> <U1, Ux, Uz, U2, U3, U4> Pattern2: {U1, Ux, U4} < , Uy, U2, U3, Ux, , Uz> <U1, Uy, U2, Ux, U3, Uz, U4> <U1, Ux, Uz, U2, U3, U4> Pattern3: {U1, Uy, U2, U3, U4} Preprocessing and normalization Control-flow graph construction Segments identification Routines identification Back Edges detection One UI can only belong to one routine! U1 U4 U1 U4 U1 U4 A routine is a frequent (gapped) sequence of user interactions. Good candidates for automation are time-consuming routines with a large number of executions and error-prone manual labor.
  • 28.
    <Uy, Ux, Uz> <Uy,Ux, Uz> <Ux, Uz> <U1, Uy, U2, U3, Ux, U4, Uz> <U1, Uy, U2, Ux, U3, Uz, U4> <U1, Ux, Uz, U2, U3, U4> Pattern1: {U1, U2, U3, U4} Solution:  Discover frequent patterns as usual  Rank them accordingly to a certain metric (e.g., length, frequency, coverage)  Select the best pattern and remove its occurrences from the segments  Repeat the procedure until no more frequent patterns left Preprocessing and normalization Control-flow graph construction Segments identification Routines identification Back Edges detection Pattern2: {Ux, Uz} <Uy> <Uy> <>
  • 29.
    Synthetic logs Supervised recording 2The last four logs were obtained by combining SR Evaluationresults.Segmentation
  • 30.
    * The secondlog describes more complex and unstructured behavior  Scholarship allocation process in the University of Melbourne  2 workers Log # Discovered segments # Identified routines # Routine variants Execution time (sec.) Scholarship1 35 2 5 41.686 Scholarship2* 3 0 0 426.319 Evaluationresults.Unsupervised recording logs
  • 31.
    A routine isautomatable if every UI in the routine can be deterministically executed based on input data, or data produced by previous UIs. Routine specification is a representation of an automatable routine that can be executed by an RPA tool. Executableroutines discovery & synthesis Routine Automatability Index (RAI) is the degree of the automatability of a routine. Computed as a ratio of the automatable UIs within the routine.
  • 32.
    Example 21 Data transformation “+61 043512 4834” “043-512-4834” SOURCE TARGET
  • 33.
    3 Synthesis of routinespecificationsas a transformationproblem
  • 34.
    Key ideas:  Synthesizeone transformation per output field and use UI log to discover input-to-output data-flows  Discover patterns in the input values and discover one transformation per input pattern Discovering data transformationsby example
  • 35.
    For each routineinstance:  Collect last edits of all target application elements  Identify corresponding sources and their values  Create input-output transformation examples (Input, Output, Source, Target) Examplesextraction:Overview Last edit Target Output Corresponding read Source Input t = ( Input = “+61 043 512 4834”, Output = “043-512-4834”, Source = “D3”, Target = “Phone” )
  • 36.
    3 +61 (039) 6899324 +61 (039) 689-9324 +61 039 689-9324 61.039.689.9324 +61 039 689 9324 039-689-9324 039.689.9324 039-689-9324 No single data transformation program Identify patterns by applying tokenization Group transformation examples with the same pattern together Discover transformation program for each group Solution Examplesextraction.Heterogeneous data
  • 37.
    3 99 Beacon Rd,Port Melbourne, VIC 3207, Australia <d>+ 99 Beacon Rd, Port Melbourne, VIC 3207, Australia <a>+ 99 Beacon Rd, Port Melbourne, VIC 3207, Australia Special characters (remain unchanged) 99 Beacon Rd, Port Melbourne, VIC 3207, Australia <d>+ <a>+ <a>+, <a>+ <a>+, <a>+ <d>+, <a>+ Example Examplesextraction.Tokenization
  • 38.
    3 +61 039 6899324 +61 035 341 2938 +61 079 149 3015 +61 039 689 9324 +61 035 341 2938 +61 079 149 3015 039 689 9324 035 341 2938 079 149 3015 +61 039 689 9324 +61 035 341 2938 +61 079 149 3015 039 689 9324 035 341 2938 079 149 3015 039 689 9324 035 341 2938 079 149 3015 split_first(0, ‘ ‘) split(0, ‘ ‘) drop(0, ‘ ‘) drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘) Input Output  Program synthesis as a search problem;  Heuristic search based on A* algorithm;  Cost function is based on the number of data manipulations;  Deals with string and table manipulations.  Implemented in the Foofah toolset Transformationdiscovery. Syntactictransformations
  • 39.
    3 +61 039 6899324 +61 035 341 2938 +61 079 149 3015 +61 039 689 9324 +61 035 341 2938 +61 079 149 3015 039 689 9324 035 341 2938 079 149 3015 split_first(0, ‘ ‘) split(0, ‘ ‘) drop(0, ‘ ‘) drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘) Input Output +61 039 689 9324 +61 035 341 2938 +61 079 149 3015 039 689 9324 035 341 2938 079 149 3015 039 689 9324 035 341 2938 079 149 3015 Transformationdiscovery. Syntactictransformations  Program synthesis as a search problem;  Heuristic search based on A* algorithm;  Cost function is based on the number of data manipulations;  Deals with string and table manipulations.  Implemented in the Foofah toolset
  • 40.
    3  Searching forfunctional dependencies;  Transformations in the form of substitution mapping schemes Transformationdiscovery. Semanticaltransformations
  • 41.
  • 42.
    Click Button “NewRecord” Click Button “Submit” Routine effects Routines aggregation
  • 43.
    Synthetic logs Supervisedrecording Unsupervised recording Evaluationresults
  • 45.
    Candidate routines discovery Discovering routines in the presence of multi-tasking and/or frequent worker distractions (the routine occurrences may overlap)  Discovering routines that are often performed in a piece-wise manner? Executable routines discovery  Discovering automatable routines where the data transfer between fields is NOT explicitly recorded in the UI log ( “copy typing”)?  How to discover automatable routines with complex conditional behavior?  How to discover semi-automatable routine specifications (for unattended RPA)? Strategic alignment, governance, people & culture:  Monitoring of RPA performance & acceptance  Compliance verification & monitoring. Open Challenges
  • 46.
    UI Log Recording V. Leno, A. Polyvyanyy, M. La Rosa, M. Dumas, & F. M. Maggi. Action Logger: Enabling Process Mining for Robotic Process Automation. In BPM Demonstration Track, 2019, pp. 124-128. UI Log Segmentation  J. Shen, L. Li, and T. G. Dietterich, Real-time detection of task switches of desktop users. IJCAI 2007, pp. 2868–2873.  Bosco, A. Augusto, M. Dumas, M. La Rosa, and G. Fortino, “Discovering automatable routines from user interaction logs,” in BPM Forum’2019. Springer.  G. Tello, G. Gianini, R. Mizouni, and E. Damiani, “Machine learning-based framework for log-lifting in business process mining applications,” in BPM’2019, Springer.  Simone Agostinelli, Francesco Leotta, Andrea Marrella: Interactive Segmentation of User Interface Logs. ICSOC 2021: 65-80 References 36
  • 47.
    Candidate Routine Identification A. Jimenez-Ramirez, H. A. Reijers, I. Barba, and C. Del Valle, “A method to improve the early stages of the robotic process automation lifecycle,” in CAiSE’2019, Springer, pp. 446–461  D. Choi, H. R’bigui, and C. Cho, “Candidate digital tasks selection methodology for automation with robotic process automation,” Sustainability 13(16):8980, 2021.  V. Leno, A. Augusto, M. Dumas, M. La Rosa, F. M. Maggi, & A. Polyvyanyy. Identifying candidate routines for Robotic Process Automation from unsegmented UI logs. In ICPM’2020, pp. 153-160, IEEE.  J. Gao, S. J. van Zelst, X. Lu, W.M.P. van der Aalst: Automated Robotic Process Automation: A Self-Learning Approach. OTM Conferences 2019, Springer, pp. 95-112 Synthesis of Executable Routine Specifications  V. Leno, M. Dumas, M. La Rosa, F. M. Maggi, & A. Polyvyanyy (2020). Automated Discovery of Data Transformations for Robotic Process Automation. AAAI Workshop on Intelligent Process Automation (IPA), 2020.  S. Agostinelli, M. Lupia, A. Marrella, M. Mecella: Automated Generation of Executable RPA Scripts from User Interface Logs. BPM Blockchain and RPA Forum 2020, Springer, pp. 116-131.  R. Dong, Z. Huang, I. Iong Lam, Y. Chen, X. Wang . WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration. PLDI’2022. References 36
  • 48.
    End-to-End Robotic ProcessMining  V. Leno, A. Polyvyanyy, M. Dumas, M. La Rosa, & F. M. Maggi (2020). Robotic Process Mining: Vision and Challenges. Business and Information Systems Engineering, pp. 1-14, Springer.  V. Leno, S. Deviatykh, A. Polyvyanyy, M. La Rosa, M. Dumas, & F. M. Maggi. Robidium: Automated Synthesis of Robotic Process Automation Scripts from UI logs. In BPM Demonstration Track, 2020, pp. 102-106.  Simone Agostinelli, Marco Lupia, Andrea Marrella, Massimo Mecella: SmartRPA: A Tool to Reactively Synthesize Software Robots from User Interface Logs. CAiSE Forum 2021, Springer, pp. 137-145.  V. Leno, A. Augusto, M. Dumas, M. La Rosa, F. M. Maggi & A. Polyvyany. Discovering executable routine specifications from user interaction logs. Information Systems 107: 101916, 2022. References 36

Editor's Notes

  • #11 Dashboards and maps/BPMN models can be used to identify bottlenecks in the user routines (long waiting times, long actions, rework, distractions etc) as well as identify best user practices
  • #12 Dashboards and maps/BPMN models can be used to identify bottlenecks in the user routines (long waiting times, long actions, rework, distractions etc) as well as identify best user practices
  • #13 No “process” automation but “task” automation Not “physical” robots but “software” robots
  • #30 Correct segments discovered for all artificial logs For most supervised recording logs LED is less than 0.1 Execution time does not exceed 4 seconds
  • #36 We search for all inputs that “contributed” to the final value of a modified field
  • #37 Optimization 1 cannot deal with heterogeneous data (values have different formats). It also fails to discover transformation when the output values are ambiguous (e.g. two transformation examples have the same output value).