Paper presentation by Artem Polyvyanyy at the AAAI Workshop on Intelligent Process Automation (IPA), New York, 7 February 2020. Paper available at: https://arxiv.org/pdf/1912.01855.pdf
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Automated Discovery of Data Transformations for Robotic Process Automation
1. Automated Discovery of
Data Transformations for
Robotic Process Automation
Volodymyr Leno, Marlon Dumas, Marcello La Rosa,
Fabrizio Maria Maggi, and Artem Polyvyanyy
The AAAI-20 Workshop on Intelligent Process Automation, February 7th 2020, New York, NY, USA
2. 3From Adobe Stock 2
What is Robotic Process Automation (RPA)?
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
3. 33
Example RPA Task
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
4. 34
Current means of automation
Information
System
Event Log
Process Mining
Discovery
Conformance
Enhancement
Process Model
Interaction
Information
systems
Users
(employees)
RPA scriptRoutine
Manual
observation
Coding
UI log
xGeneration
Identification
Requires a lot of time
Information about routine can be incomplete
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
5. UI log
5The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
V. Leno, A. Polyvyanyy, M. La Rosa, M. Dumas and F. Maria Maggi. Action logger: Enabling process mining for robotic
process automation. In Proceedings of Demonstration Track at BPM 2019, 124–128, 2019
9. 39
Preprocessing
Identify task traces
Filter out redundant actions
Regular expression find and replace rules:
Control-flow based (e.g. double copying without pasting)
Data-aware rules (e.g. double editing of text field with replacement)
Segmentation
Identify actions in task traces
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
10. 310
Examples extraction
For each task trace:
Collect the values of all read cells/fields (Inputs)
Collect the latest values of all modified cells/fields (Outputs)
Create input-output transformation example (Inputs, Outputs)
Inputs = [“Albert”, “Rauf”,
“11/04/1986”, “+61 043 512 4834”,
“arauf@gmail.com”, “Germany”,
“99 Beacon Rd, Port Melbourne,
VIC 3207, Australia”]
Outputs = [“Albert Rauf”, “11-04-
1986”, “Germany”, “043-512-4834”,
“arauf@gmail.com”, “99 Beacon Rd”,
“Port Melbourne”, “VIC”, “3207”,
“Australia”]
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
11. 311
Transformation discovery
FOOFAH – transformation discovery by example
Program synthesis as a search problem in a state space graph
Heuristic search approach based on A* algorithm
Cost function is an amount of manipulations
Deals with string and table manipulations
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
039 689 9324
035 341 2938
079 149 3015
split_first(0, ‘ ‘)
split(0, ‘ ‘)
drop(0, ‘ ‘)
drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘)
Input Output
12. 312
Transformation discovery
FOOFAH – transformation discovery by example
Program synthesis as a search problem in state space graph
Heuristic search approach based on A* algorithm
Cost function is an amount of manipulations
Deals with string and table manipulations
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
split_first(0, ‘ ‘)
split(0, ‘ ‘)
drop(0, ‘ ‘)
drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘)
Input Output
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
039 689 9324
035 341 2938
079 149 3015
13. 313
Baseline approach. Limitations
Requires a lot of time to discover a transformation
May not discover a complex transformation
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
14. 314
Optimization 1: Grouping by targets
+61 039 689 9324 => 039-689-9324
+61 043 512 4834 => 039-689-9324
16 Morris St, South Melbourne, VIC 3205, Australia => 3205
99 Beacon Rd, Port Melbourne, VIC 3207, Australia => 99 Beacon Rd
122 Albert St, Port Melbourne, VIC 3207, Australia => 122 Albert St
9/271 William St, Melbourne, VIC 3000, Australia => 3000
(Spreadsheet.Column_D, WebForm.Phone)
(Spreadsheet.Column_G,
WebForm.Street)
(Spreadsheet.Column_G,
WebForm.ZipCode)
Transformation example = (I, O, S, T)
I – input value(s) (e.g., “+61 039 689 9324”)
O – output value(s) (e.g., “039-689-9324”)
S – source(s) (e.g., cell D1)
T – target (e.g., text field Phone)
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
15. 315
Optimization 1. Examples extraction
Collect last edits of all target application elements
Identify corresponding sources and their values
Create input-output transformation examples (Input, Output, Source, Target)
Last edit Output
Corresponding read Source Input
Target
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
16. 316
Optimization 2: Grouping by input pattern
+61 (039) 689 9324
+61 (039) 689-9324
+61 039 689-9324
61.039.689.9324
+61 039 689 9324
039-689-9324
039.689.9324
039-689-9324
No single data transformation program
Identify patterns by applying tokenization
Group transformation examples with the
same pattern together
Discover transformation program for each group
Solution
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
17. 317
Optimization 2. Tokenization
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<d>+
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<a>+
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
Special characters
(remain unchanged)
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<d>+ <a>+ <a>+, <a>+ <a>+, <a>+ <d>+, <a>+
Example
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
18. Evaluation
18
Three approaches:
a) baseline
b) approach with target grouping (optimization 1)
c) approach with target grouping and grouping by input structure (optimization 1 + optimization 2)
Two experiments:
a) performance and discovery of different types of transformations in isolation
b) performance and discovery of data transformations for full use case
UI logs recorded by Action Logger (Leno et al. 2019)
Experiments conducted on a Windows 10 x64 machine with Intel Core i5-5200U CPU 2.20GHz and
16GB RAM, running Ubuntu 16.04 LTS (64-bit) with 8GB RAM and JVM 11 (4GB RAM)
FOOFAH timeout is set to 1 hour
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
19. Results
19
Transformation
type
Example Baseline Opt1 Opt1 +
Opt2
N – 1 “Igor”, ”Honchar” => “Igor Honchar” 1.295 1.584 1.745
1 – 1 “18/08/1992” => “18-08-1992” 6.584 6.639 0.476
1 – 1 “+61 029 211 4904” => “029-211-4904” N/A (2306.036) N/A (2271.19) 0.5086
1 – 1 “New Zealand” => “New Zealand” 0.347 0.392 0.704
1 – 1 “wmacdonald@gmail.com” => “wmacdonald@gmail.com” 0.34 0.391 0.397
1 – N “122 Albert St, Port Melbourne, VIC 3207, Australia” =>
“122 Albert St”, “Port Melbourne”, “VIC”, “3207”
timeout 7504.934 85.423
1 – 1 “122 Albert St, Port Melbourne, VIC 3207, Australia” =>
“122 Albert St”
- 1.243 1.55
1 – 1 “122 Albert St, Port Melbourne, VIC 3207, Australia” =>
“Port Melbourne”
- N/A (1983.501) 54.777
1 – 1 “122 Albert St, Port Melbourne, VIC 3207, Australia” =>
“VIC”
- timeout 26.603
1 – 1 “122 Albert St, Port Melbourne, VIC 3207, Australia” =>
“3207”
- N/A (1884.397) 2.49
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
20. Results
20
Approach Discovered
transformations
Baseline (0/1) 0%
Opt1 (5/9) 56%
Opt1 + Opt2 (9/9) 100%
3742.67
1172.39
14.54
0
500
1000
1500
2000
2500
3000
3500
4000
Baseline Opt1 Opt1 + Opt2
Avg. execution time (in seconds) for transformation
UI Log: data transferring task
that simulates real life use case
from university
Task traces: 50
Actions in total: 2409
Input elements: 7
Output elements: 10
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
21. Limitations and future work
21
Requires output fields to be derived from fields that are explicitly accessed (no “eye tracking”)
Works only with segmented logs
Can not discover conditional behavior
Can not discover routines performed in dynamic forms (e.g. copying a purchase order that
consists of multiple line items)
Limitations
Future work
Extend a set of discovered transformations
Design segmentation technique
The AAAI-20 Workshop on Intelligent Process Automation, New York, February 7, 2020
Editor's Notes
No “process” automation but “task” automation
Not “physical” robots but “software” robots
Use case inspired by a real-life scenario at the University of Melbourne
V. Leno, A. Polyvyanyy, M. La Rosa, M. Dumas and F. Maria Maggi. Action logger: Enabling process mining for robotic process automation. In Proceedings of the Dissertation Award, Doctoral Consortium, and Demonstration Track at BPM 2019, 124–128, 2019
Available recording tools (e.g., WinParrot, JitBit) record low-level action only – clickstreams, keystrokes
Although RPA tools (e.g., UI Path, Automation Anywhere) provide recording capabilities they are focused on manual programming of scripts. They do not record values of involved fields, do not capture timestamps, etc.
In UI Path Studio, however, there is a component called UI Explorer, that is similar to our Action Logger, but it works only for Web (supports limited amount of actions), while our tool covers also Excel spreadsheet
Baseline approach aims to discover document-to-document transformation, e.g. a program that maps all inputs into all outputs
This optimization decomposes document-to-document transformation into element-to-element, grouping transformation examples by the target element. For Excel, we make a projection of cells into their rows and columns
We search for all inputs that “contributed” to the final value of a modified field
Optimization 1 cannot deal with heterogeneous data (values have different formats).
It also fails to discover transformation when the output values are ambiguous (e.g. two transformation examples have the same output value).