SlideShare a Scribd company logo
XESLite
Handling Event Logs in ProM
Felix Mannhardt (f.mannhardt@tue.nl)
@fmannhardt
Motivation – How do event logs look like?
PAGE 1
multi set table
Motivation – How are event logs used?
PAGE 2
• Most process discovery techniques
• Most conformance checking techniques
• …
• Data-aware process discovery
• Data-aware conformance checking
• Most enhancement techniques
• …
Of course, the world is not black & white!
Motivation – Using ProM on a standard computer
PAGE 3
~ 4-8 GB of working memory
www.xes-standard.org
10.1109/IEEESTD.2016.7740858
Source: 1849-2016 - IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams, © IEEE
IEEE
XES – The event log standard
OpenXES – An (outdated) reference implementation
PAGE 5
OpenXES – Memory Layout
PAGE 6
XEvent
XID HashMap
UUID Node[m]
Entry
Key XAttribute
Value
OpenXES – Memory Layout
PAGE 7
XEvent
XID HashMap
UUID
32 bytes
Node[m]
Entry
Key
k bytes
XAttribute
32 + v bytes
Value
v bytes
OpenXES – Memory Layout
PAGE 8
XEvent
XID HashMap
UUID
32 bytes
Node[m]
16 + 4m + (64+k+v)m bytes
Entry
32 + k + 32 + v bytes
Key
k bytes
XAttribute
32 + v bytes
Value
v bytes
OpenXES – Memory Layout
PAGE 9
XEvent
XID
16 + 32 bytes
HashMap
48 + 16 + (68+k+v)m bytes
UUID
32 bytes
Node[m]
16 + 4m + (64+k+v)m bytes
Entry
32 + k + 32 + v bytes
Key
k bytes
XAttribute
32 + v bytes
Value
v bytes
OpenXES – Memory Layout
PAGE 10
XEvent
24 + 48 + 64 + (68+k+v)m bytes
XID
16 + 32 bytes
HashMap
48 + 16 + (68+k+v)m bytes
UUID
32 bytes
Node[m]
16 + 4m + (64+k+v)m bytes
Entry
32 + k + 32 + v bytes
Key
k bytes
XAttribute
32 + v bytes
Value
v bytes
OpenXES – Memory Usage vs ‘Minimal’ Scenario
PAGE 11
OpenXES Minimal
0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0
0.01
0.10
1.00
4.00
10.00
100.00
1,000.00
Number of events in millions (n)
Memoryusage(GB)
Attribute size (bytes) 8 48 Attributes (m) 3 25 50
Minimal scenario: n x m table of attributes (m) and events (n), no compression, no overhead
XESLite – Several attempts to solve the issue
PAGE 12
Definition of XESLite
(1) having too much fun in programming
(2) being fed up with OOM exceptions
(3) disbelieving that 17 MB zipped XES
requires GBs of memory
24.02.2014 16:59 – fmannhardt.de
XESLite –Three methods & Assumptions
PAGE 13
Automaton
(XL-AT)
In-Memory
(XL-IM)
Database
(XL-DB)
• no external software / hardware
• ~ 4-8 GB memory
• compatibility
XESLite – General ideas – Flyweight literals
PAGE 14
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
64 bytes – java.lang.String – concept:name
…..
XESLite – General ideas – Flyweight literals
PAGE 15
Google Guava (github.com/google/guava)
Interner<String> interner = Interners.newStrongInterner();
…
…
XAttribute createAttribute(String key, …) {
String key = interner.intern(key);
…
}
Disclaimer:
• Considerable overhead when many unique literals!
• No garbage collection when deleting literals!
XESLite – General ideas – Sequential IDs
PAGE 16
XEvent
24 + 48 + 64 + (68+k+v)m bytes
XID
16 + 32 bytes
HashMap
UUID
32 bytes
Node[m]
Entry
Key XAttribute
Value
XESLite – General ideas – Sequential IDs
PAGE 17
XEvent
24 + 8 + 64 + (68+k+v)m bytes
long
8 bytes
HashMap
48 + 16 + (68+k+v)m bytes
40 bytes saved per event
Auch Kleinvieh macht Mist!
Disclaimer:
• No distributed events!
• Don’t assume the XID returns a real UUID
XESLite – General Ideas – Compressed Traces
PAGE 18
What is a trace?
Idea: Delta compression!
ok, quite idealistic situation
LZ4 compression
(400 MB/s compression & several GB/s decompression)
Disclaimer:
• Random-access methods  Slow
• Use iterator / foreach instead of get(i)!
XESLite – Automaton (XL-AT)
PAGE 19
multi set table
XESLite – Automaton (XL-AT)
PAGE 20
finite set
of sequences
multiplicity
encode
similar problem
XESLite – Automaton (XL-AT)
PAGE 21
external informationfinite set of words
research on from the 1990
minimal
deterministic acyclic
finite automaton
minimal perfect
hashing
XESLite – Automaton (XL-AT) – Example
PAGE 22
(1) build minimal DAFA
Automata minimization is a well-researched problem
• Minimization of any DFA: O(n log(n)) with n states (Hopcroft 1974)
• Minimization for acyclic DFA can be done in linear time (Revuz 1992, Daciuk 2000)
XESLite – Automaton (XL-AT) – Example
PAGE 23
(2) build minimal perfect hashing scheme
Assign unique consecutive numbers
1..n to words accepted by the DAFA.
1
2
3
4
XESLite – Automaton (XL-AT) – Example
PAGE 24
(2) build minimal perfect hashing scheme
1
2
3
4
• Use lexicographical ordering
• Assign number based on predecessors
• Encode this scheme efficiently in the DAFA
XESLite – Automaton (XL-AT) – Example
PAGE 25
(2) build minimal perfect hashing scheme
1
2
3
4
• Remember the number of words accepted from states
• Compute number for word w
• Add the numbers of all those states for which
a transition t leads from the path to the state and
the letter of transition t precedes the next letter.
• Add the number of final states passed.
3 (3)
XESLite – Automaton (XL-AT) – Example
PAGE 26
lookup tableDAFA
Luchesi 1992: Applications of Finite Automata Representing Large Vocabularies
Daciuk 2005: Dynamic Perfect Hashing with Finite-State Automata
3 (3)
XESLite – In-Memory (XL-IM)
Tabular view instead of the object graph of OpenXES
PAGE 27
XESLite – In-Memory (XL-IM)
Events consists only of identifiers
PAGE 28
XEvent
12 + 8 + 4 bytes
long (ID)
8 bytes
Object (Storage)
4 bytes
XEvent
24 + 48 + 64 + (68+k+v)m bytes
XID
16 + 32 bytes
HashMap
48 + 16 + (68+k+v)m bytes
UUID
32 bytes
Node[m]
16 + 4m + (64+k+v)m bytes
Entry
32 + k + 32 + v bytes
Key
k bytes
XAttribute
32 + v bytes
Value
v bytes
with trace compression
?? bytes
XESLite – In-Memory (XL-IM)
PAGE 29
Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems
+ Compression / packing of similar values
+ Many other optimization possible
XESLite – In-Memory (XL-IM)
• Column-store like custom in-memory data structure in Java
• No communication overhead with external tools
• Assumptions
• Fixed-width values for fast access (lookup table for literals – flyweights for free)
• Consistent attribute types (i.e., columns types are enforced)
• Dynamic memory allocation in (compressed) blocks
PAGE 30
Block storing 2 integer values Block storing 8 boolean values
Disclaimer:
• No real deletion  only mark as delete!
• Meta-attributes supported but inefficient!
• Spawns a compressor thread!
XESLite – (Embedded) Database (XL-DB)
PAGE 31
As XL-IM, a tabular view instead of the object graph of OpenXES
MapDB
stored as key/value pairs
• On-disk storage (mmaped-file)
• Uses operating system paging
• Caching mechanism for
common attributes:
• concept:name,
• time:timestamp,
• lifecycle:transition
• Supports all OpenXES
functionality!
Disclaimer:
• No real deletion  only mark as delete!
• Spawns a multiple threads!
• MMAP files in temp folder might not be deleted!
Benchmark - Memory
PAGE 32
Road Fines
No difference XL-DB vs XL-IM BPI 2011 vs
Hospital Billing
Benchmark - Time
PAGE 33
Garbage Coll.
No difference? BTree!
Random-access
implementation detail
Conclusion
PAGE 34
• Discussion on requirements
• Multi set vs Table
• Storage requirements
• Three general ideas
• Flyweights
• Sequential IDs
• Compressed Traces
• Three XESLite implementations
• Automaton (XL-AT)
• In-Memory (XL-IM)
• Database (XL-DB)
• Details in technical report:
• BPM Center Report BPM-16-02

More Related Content

What's hot

If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...Robert Burrell Donkin
 
Gems in the python standard library
Gems in the python standard libraryGems in the python standard library
Gems in the python standard libraryjasonscheirer
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Daniel Lemire
 
Postgres database Ibrahem Batta
Postgres database Ibrahem BattaPostgres database Ibrahem Batta
Postgres database Ibrahem Batta
Ibrahem Batta
 
Geo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXGeo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDX
Luis Bermudez
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
Matthias Feys
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization Techniques
Joud Khattab
 
Mining top k frequent closed itemsets
Mining top k frequent closed itemsetsMining top k frequent closed itemsets
Mining top k frequent closed itemsetsyuanchung
 
Alluxio
AlluxioAlluxio
(Julien le dem) parquet
(Julien le dem)   parquet(Julien le dem)   parquet
(Julien le dem) parquetNAVER D2
 
[Webinar] Scientific Computation and Data Visualization with Ruby
[Webinar] Scientific Computation and Data Visualization with Ruby [Webinar] Scientific Computation and Data Visualization with Ruby
[Webinar] Scientific Computation and Data Visualization with Ruby
Srijan Technologies
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
Alexis Seigneurin
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Taegyun Jeon
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
Julien Le Dem
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
The HDF-EOS Tools and Information Center
 
MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
JavaDayUA
 
Coding convention
Coding conventionCoding convention
Coding convention
Khoa Nguyen
 

What's hot (20)

Week3 binary trees
Week3 binary treesWeek3 binary trees
Week3 binary trees
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...
 
Gems in the python standard library
Gems in the python standard libraryGems in the python standard library
Gems in the python standard library
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Postgres database Ibrahem Batta
Postgres database Ibrahem BattaPostgres database Ibrahem Batta
Postgres database Ibrahem Batta
 
Geo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXGeo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDX
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization Techniques
 
Mining top k frequent closed itemsets
Mining top k frequent closed itemsetsMining top k frequent closed itemsets
Mining top k frequent closed itemsets
 
Alluxio
AlluxioAlluxio
Alluxio
 
(Julien le dem) parquet
(Julien le dem)   parquet(Julien le dem)   parquet
(Julien le dem) parquet
 
[Webinar] Scientific Computation and Data Visualization with Ruby
[Webinar] Scientific Computation and Data Visualization with Ruby [Webinar] Scientific Computation and Data Visualization with Ruby
[Webinar] Scientific Computation and Data Visualization with Ruby
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
 
PetaPG
PetaPGPetaPG
PetaPG
 
Coding convention
Coding conventionCoding convention
Coding convention
 

Similar to XESLite - Handling Event Logs in ProM

Gdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationGdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationbrettlevin
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
InfinIT - Innovationsnetværket for it
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
Pierre de Lacaze
 
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
Ontico
 
Bringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big DataBringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big Data
bcantrill
 
C-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptC-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.ppt
JinwenZhong1
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
Daniel Lemire
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
Shy Engelberg
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsAjay Ohri
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
Hadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindHadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilind
EMC
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
InfinIT - Innovationsnetværket for it
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Zhe Zhang
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
Minio
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
 
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxonur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
sivasubramanianManic2
 

Similar to XESLite - Handling Event Logs in ProM (20)

Gdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationGdc03 ericson memory_optimization
Gdc03 ericson memory_optimization
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Techn...
 
Bringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big DataBringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big Data
 
C-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.pptC-Store-s553-stonebraker.ppt
C-Store-s553-stonebraker.ppt
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Hadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindHadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilind
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxonur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
 

More from Felix Mannhardt

A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...
A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...
A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...
Felix Mannhardt
 
Estimating the Impact of Incidents on Process Delay - ICPM 2019
Estimating the Impact of Incidents on Process Delay - ICPM 2019Estimating the Impact of Incidents on Process Delay - ICPM 2019
Estimating the Impact of Incidents on Process Delay - ICPM 2019
Felix Mannhardt
 
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...
Felix Mannhardt
 
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...
Felix Mannhardt
 
From Low-Level Events to Activities - A Pattern-based Approach
From Low-Level Events to Activities - A Pattern-based ApproachFrom Low-Level Events to Activities - A Pattern-based Approach
From Low-Level Events to Activities - A Pattern-based Approach
Felix Mannhardt
 
Analyzing the Trajectories of Patients with Sepsis using Process Mining
Analyzing the Trajectories of Patients with Sepsis using Process MiningAnalyzing the Trajectories of Patients with Sepsis using Process Mining
Analyzing the Trajectories of Patients with Sepsis using Process Mining
Felix Mannhardt
 
Measuring the Precision of Multi-perspective Process Models
Measuring the Precision of Multi-perspective Process ModelsMeasuring the Precision of Multi-perspective Process Models
Measuring the Precision of Multi-perspective Process Models
Felix Mannhardt
 
From Low-Level Events to Activities - A Pattern based Approach
From Low-Level Events to Activities - A Pattern based ApproachFrom Low-Level Events to Activities - A Pattern based Approach
From Low-Level Events to Activities - A Pattern based Approach
Felix Mannhardt
 
Decision Mining Revisited - Discovering Overlapping Rules
Decision Mining Revisited - Discovering Overlapping RulesDecision Mining Revisited - Discovering Overlapping Rules
Decision Mining Revisited - Discovering Overlapping Rules
Felix Mannhardt
 

More from Felix Mannhardt (9)

A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...
A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...
A Taxonomy for Combining Activity Recognition and Process Discovery in Indust...
 
Estimating the Impact of Incidents on Process Delay - ICPM 2019
Estimating the Impact of Incidents on Process Delay - ICPM 2019Estimating the Impact of Incidents on Process Delay - ICPM 2019
Estimating the Impact of Incidents on Process Delay - ICPM 2019
 
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior fro...
 
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...
Unsupervised Event Abstraction using Pattern Abstraction and Local Process Mo...
 
From Low-Level Events to Activities - A Pattern-based Approach
From Low-Level Events to Activities - A Pattern-based ApproachFrom Low-Level Events to Activities - A Pattern-based Approach
From Low-Level Events to Activities - A Pattern-based Approach
 
Analyzing the Trajectories of Patients with Sepsis using Process Mining
Analyzing the Trajectories of Patients with Sepsis using Process MiningAnalyzing the Trajectories of Patients with Sepsis using Process Mining
Analyzing the Trajectories of Patients with Sepsis using Process Mining
 
Measuring the Precision of Multi-perspective Process Models
Measuring the Precision of Multi-perspective Process ModelsMeasuring the Precision of Multi-perspective Process Models
Measuring the Precision of Multi-perspective Process Models
 
From Low-Level Events to Activities - A Pattern based Approach
From Low-Level Events to Activities - A Pattern based ApproachFrom Low-Level Events to Activities - A Pattern based Approach
From Low-Level Events to Activities - A Pattern based Approach
 
Decision Mining Revisited - Discovering Overlapping Rules
Decision Mining Revisited - Discovering Overlapping RulesDecision Mining Revisited - Discovering Overlapping Rules
Decision Mining Revisited - Discovering Overlapping Rules
 

Recently uploaded

Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 

Recently uploaded (20)

Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 

XESLite - Handling Event Logs in ProM

  • 1. XESLite Handling Event Logs in ProM Felix Mannhardt (f.mannhardt@tue.nl) @fmannhardt
  • 2. Motivation – How do event logs look like? PAGE 1 multi set table
  • 3. Motivation – How are event logs used? PAGE 2 • Most process discovery techniques • Most conformance checking techniques • … • Data-aware process discovery • Data-aware conformance checking • Most enhancement techniques • … Of course, the world is not black & white!
  • 4. Motivation – Using ProM on a standard computer PAGE 3 ~ 4-8 GB of working memory
  • 5. www.xes-standard.org 10.1109/IEEESTD.2016.7740858 Source: 1849-2016 - IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams, © IEEE IEEE XES – The event log standard
  • 6. OpenXES – An (outdated) reference implementation PAGE 5
  • 7. OpenXES – Memory Layout PAGE 6 XEvent XID HashMap UUID Node[m] Entry Key XAttribute Value
  • 8. OpenXES – Memory Layout PAGE 7 XEvent XID HashMap UUID 32 bytes Node[m] Entry Key k bytes XAttribute 32 + v bytes Value v bytes
  • 9. OpenXES – Memory Layout PAGE 8 XEvent XID HashMap UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes
  • 10. OpenXES – Memory Layout PAGE 9 XEvent XID 16 + 32 bytes HashMap 48 + 16 + (68+k+v)m bytes UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes
  • 11. OpenXES – Memory Layout PAGE 10 XEvent 24 + 48 + 64 + (68+k+v)m bytes XID 16 + 32 bytes HashMap 48 + 16 + (68+k+v)m bytes UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes
  • 12. OpenXES – Memory Usage vs ‘Minimal’ Scenario PAGE 11 OpenXES Minimal 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.01 0.10 1.00 4.00 10.00 100.00 1,000.00 Number of events in millions (n) Memoryusage(GB) Attribute size (bytes) 8 48 Attributes (m) 3 25 50 Minimal scenario: n x m table of attributes (m) and events (n), no compression, no overhead
  • 13. XESLite – Several attempts to solve the issue PAGE 12 Definition of XESLite (1) having too much fun in programming (2) being fed up with OOM exceptions (3) disbelieving that 17 MB zipped XES requires GBs of memory 24.02.2014 16:59 – fmannhardt.de
  • 14. XESLite –Three methods & Assumptions PAGE 13 Automaton (XL-AT) In-Memory (XL-IM) Database (XL-DB) • no external software / hardware • ~ 4-8 GB memory • compatibility
  • 15. XESLite – General ideas – Flyweight literals PAGE 14 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name …..
  • 16. XESLite – General ideas – Flyweight literals PAGE 15 Google Guava (github.com/google/guava) Interner<String> interner = Interners.newStrongInterner(); … … XAttribute createAttribute(String key, …) { String key = interner.intern(key); … } Disclaimer: • Considerable overhead when many unique literals! • No garbage collection when deleting literals!
  • 17. XESLite – General ideas – Sequential IDs PAGE 16 XEvent 24 + 48 + 64 + (68+k+v)m bytes XID 16 + 32 bytes HashMap UUID 32 bytes Node[m] Entry Key XAttribute Value
  • 18. XESLite – General ideas – Sequential IDs PAGE 17 XEvent 24 + 8 + 64 + (68+k+v)m bytes long 8 bytes HashMap 48 + 16 + (68+k+v)m bytes 40 bytes saved per event Auch Kleinvieh macht Mist! Disclaimer: • No distributed events! • Don’t assume the XID returns a real UUID
  • 19. XESLite – General Ideas – Compressed Traces PAGE 18 What is a trace? Idea: Delta compression! ok, quite idealistic situation LZ4 compression (400 MB/s compression & several GB/s decompression) Disclaimer: • Random-access methods  Slow • Use iterator / foreach instead of get(i)!
  • 20. XESLite – Automaton (XL-AT) PAGE 19 multi set table
  • 21. XESLite – Automaton (XL-AT) PAGE 20 finite set of sequences multiplicity encode similar problem
  • 22. XESLite – Automaton (XL-AT) PAGE 21 external informationfinite set of words research on from the 1990 minimal deterministic acyclic finite automaton minimal perfect hashing
  • 23. XESLite – Automaton (XL-AT) – Example PAGE 22 (1) build minimal DAFA Automata minimization is a well-researched problem • Minimization of any DFA: O(n log(n)) with n states (Hopcroft 1974) • Minimization for acyclic DFA can be done in linear time (Revuz 1992, Daciuk 2000)
  • 24. XESLite – Automaton (XL-AT) – Example PAGE 23 (2) build minimal perfect hashing scheme Assign unique consecutive numbers 1..n to words accepted by the DAFA. 1 2 3 4
  • 25. XESLite – Automaton (XL-AT) – Example PAGE 24 (2) build minimal perfect hashing scheme 1 2 3 4 • Use lexicographical ordering • Assign number based on predecessors • Encode this scheme efficiently in the DAFA
  • 26. XESLite – Automaton (XL-AT) – Example PAGE 25 (2) build minimal perfect hashing scheme 1 2 3 4 • Remember the number of words accepted from states • Compute number for word w • Add the numbers of all those states for which a transition t leads from the path to the state and the letter of transition t precedes the next letter. • Add the number of final states passed. 3 (3)
  • 27. XESLite – Automaton (XL-AT) – Example PAGE 26 lookup tableDAFA Luchesi 1992: Applications of Finite Automata Representing Large Vocabularies Daciuk 2005: Dynamic Perfect Hashing with Finite-State Automata 3 (3)
  • 28. XESLite – In-Memory (XL-IM) Tabular view instead of the object graph of OpenXES PAGE 27
  • 29. XESLite – In-Memory (XL-IM) Events consists only of identifiers PAGE 28 XEvent 12 + 8 + 4 bytes long (ID) 8 bytes Object (Storage) 4 bytes XEvent 24 + 48 + 64 + (68+k+v)m bytes XID 16 + 32 bytes HashMap 48 + 16 + (68+k+v)m bytes UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes with trace compression ?? bytes
  • 30. XESLite – In-Memory (XL-IM) PAGE 29 Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems + Compression / packing of similar values + Many other optimization possible
  • 31. XESLite – In-Memory (XL-IM) • Column-store like custom in-memory data structure in Java • No communication overhead with external tools • Assumptions • Fixed-width values for fast access (lookup table for literals – flyweights for free) • Consistent attribute types (i.e., columns types are enforced) • Dynamic memory allocation in (compressed) blocks PAGE 30 Block storing 2 integer values Block storing 8 boolean values Disclaimer: • No real deletion  only mark as delete! • Meta-attributes supported but inefficient! • Spawns a compressor thread!
  • 32. XESLite – (Embedded) Database (XL-DB) PAGE 31 As XL-IM, a tabular view instead of the object graph of OpenXES MapDB stored as key/value pairs • On-disk storage (mmaped-file) • Uses operating system paging • Caching mechanism for common attributes: • concept:name, • time:timestamp, • lifecycle:transition • Supports all OpenXES functionality! Disclaimer: • No real deletion  only mark as delete! • Spawns a multiple threads! • MMAP files in temp folder might not be deleted!
  • 33. Benchmark - Memory PAGE 32 Road Fines No difference XL-DB vs XL-IM BPI 2011 vs Hospital Billing
  • 34. Benchmark - Time PAGE 33 Garbage Coll. No difference? BTree! Random-access implementation detail
  • 35. Conclusion PAGE 34 • Discussion on requirements • Multi set vs Table • Storage requirements • Three general ideas • Flyweights • Sequential IDs • Compressed Traces • Three XESLite implementations • Automaton (XL-AT) • In-Memory (XL-IM) • Database (XL-DB) • Details in technical report: • BPM Center Report BPM-16-02

Editor's Notes

  1. A block of size 2, which stores the integer values of the attribute amount. Bytes 0 until 7 are used to store two 4 bytes integer values, which are shown encoded in big-endian order. Byte 8 is used to store flags indicating whether the attribute is not set (i.e., ⊥).