Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

XESLite - Handling Event Logs in ProM

474 views

Published on

Process mining methods use data recorded by information systems to analyze the real execution of processes.
This event data is stored in an event log, which is the main input to most process mining methods.

The XES standard provides a uniform way to store event logs.
OpenXES is the XES reference implementation, which is used widely by research tools. However, OpenXES is not scalable towards large event log.

XESLite provides solutions to manage large event logs that are compatible with the OpenXES interfaces. Therefore, it can be used as drop-in replacement for existing algorithms. This presentation investigates the storage requirements of different types of event logs, describes XESLite, and contains a benchmark of XESLite and OpenXES based on real-life event logs.

Published in: Science
  • Be the first to comment

  • Be the first to like this

XESLite - Handling Event Logs in ProM

  1. 1. XESLite Handling Event Logs in ProM Felix Mannhardt (f.mannhardt@tue.nl) @fmannhardt
  2. 2. Motivation – How do event logs look like? PAGE 1 multi set table
  3. 3. Motivation – How are event logs used? PAGE 2 • Most process discovery techniques • Most conformance checking techniques • … • Data-aware process discovery • Data-aware conformance checking • Most enhancement techniques • … Of course, the world is not black & white!
  4. 4. Motivation – Using ProM on a standard computer PAGE 3 ~ 4-8 GB of working memory
  5. 5. www.xes-standard.org 10.1109/IEEESTD.2016.7740858 Source: 1849-2016 - IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams, © IEEE IEEE XES – The event log standard
  6. 6. OpenXES – An (outdated) reference implementation PAGE 5
  7. 7. OpenXES – Memory Layout PAGE 6 XEvent XID HashMap UUID Node[m] Entry Key XAttribute Value
  8. 8. OpenXES – Memory Layout PAGE 7 XEvent XID HashMap UUID 32 bytes Node[m] Entry Key k bytes XAttribute 32 + v bytes Value v bytes
  9. 9. OpenXES – Memory Layout PAGE 8 XEvent XID HashMap UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes
  10. 10. OpenXES – Memory Layout PAGE 9 XEvent XID 16 + 32 bytes HashMap 48 + 16 + (68+k+v)m bytes UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes
  11. 11. OpenXES – Memory Layout PAGE 10 XEvent 24 + 48 + 64 + (68+k+v)m bytes XID 16 + 32 bytes HashMap 48 + 16 + (68+k+v)m bytes UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes
  12. 12. OpenXES – Memory Usage vs ‘Minimal’ Scenario PAGE 11 OpenXES Minimal 0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0 0.01 0.10 1.00 4.00 10.00 100.00 1,000.00 Number of events in millions (n) Memoryusage(GB) Attribute size (bytes) 8 48 Attributes (m) 3 25 50 Minimal scenario: n x m table of attributes (m) and events (n), no compression, no overhead
  13. 13. XESLite – Several attempts to solve the issue PAGE 12 Definition of XESLite (1) having too much fun in programming (2) being fed up with OOM exceptions (3) disbelieving that 17 MB zipped XES requires GBs of memory 24.02.2014 16:59 – fmannhardt.de
  14. 14. XESLite –Three methods & Assumptions PAGE 13 Automaton (XL-AT) In-Memory (XL-IM) Database (XL-DB) • no external software / hardware • ~ 4-8 GB memory • compatibility
  15. 15. XESLite – General ideas – Flyweight literals PAGE 14 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name 64 bytes – java.lang.String – concept:name …..
  16. 16. XESLite – General ideas – Flyweight literals PAGE 15 Google Guava (github.com/google/guava) Interner<String> interner = Interners.newStrongInterner(); … … XAttribute createAttribute(String key, …) { String key = interner.intern(key); … } Disclaimer: • Considerable overhead when many unique literals! • No garbage collection when deleting literals!
  17. 17. XESLite – General ideas – Sequential IDs PAGE 16 XEvent 24 + 48 + 64 + (68+k+v)m bytes XID 16 + 32 bytes HashMap UUID 32 bytes Node[m] Entry Key XAttribute Value
  18. 18. XESLite – General ideas – Sequential IDs PAGE 17 XEvent 24 + 8 + 64 + (68+k+v)m bytes long 8 bytes HashMap 48 + 16 + (68+k+v)m bytes 40 bytes saved per event Auch Kleinvieh macht Mist! Disclaimer: • No distributed events! • Don’t assume the XID returns a real UUID
  19. 19. XESLite – General Ideas – Compressed Traces PAGE 18 What is a trace? Idea: Delta compression! ok, quite idealistic situation LZ4 compression (400 MB/s compression & several GB/s decompression) Disclaimer: • Random-access methods  Slow • Use iterator / foreach instead of get(i)!
  20. 20. XESLite – Automaton (XL-AT) PAGE 19 multi set table
  21. 21. XESLite – Automaton (XL-AT) PAGE 20 finite set of sequences multiplicity encode similar problem
  22. 22. XESLite – Automaton (XL-AT) PAGE 21 external informationfinite set of words research on from the 1990 minimal deterministic acyclic finite automaton minimal perfect hashing
  23. 23. XESLite – Automaton (XL-AT) – Example PAGE 22 (1) build minimal DAFA Automata minimization is a well-researched problem • Minimization of any DFA: O(n log(n)) with n states (Hopcroft 1974) • Minimization for acyclic DFA can be done in linear time (Revuz 1992, Daciuk 2000)
  24. 24. XESLite – Automaton (XL-AT) – Example PAGE 23 (2) build minimal perfect hashing scheme Assign unique consecutive numbers 1..n to words accepted by the DAFA. 1 2 3 4
  25. 25. XESLite – Automaton (XL-AT) – Example PAGE 24 (2) build minimal perfect hashing scheme 1 2 3 4 • Use lexicographical ordering • Assign number based on predecessors • Encode this scheme efficiently in the DAFA
  26. 26. XESLite – Automaton (XL-AT) – Example PAGE 25 (2) build minimal perfect hashing scheme 1 2 3 4 • Remember the number of words accepted from states • Compute number for word w • Add the numbers of all those states for which a transition t leads from the path to the state and the letter of transition t precedes the next letter. • Add the number of final states passed. 3 (3)
  27. 27. XESLite – Automaton (XL-AT) – Example PAGE 26 lookup tableDAFA Luchesi 1992: Applications of Finite Automata Representing Large Vocabularies Daciuk 2005: Dynamic Perfect Hashing with Finite-State Automata 3 (3)
  28. 28. XESLite – In-Memory (XL-IM) Tabular view instead of the object graph of OpenXES PAGE 27
  29. 29. XESLite – In-Memory (XL-IM) Events consists only of identifiers PAGE 28 XEvent 12 + 8 + 4 bytes long (ID) 8 bytes Object (Storage) 4 bytes XEvent 24 + 48 + 64 + (68+k+v)m bytes XID 16 + 32 bytes HashMap 48 + 16 + (68+k+v)m bytes UUID 32 bytes Node[m] 16 + 4m + (64+k+v)m bytes Entry 32 + k + 32 + v bytes Key k bytes XAttribute 32 + v bytes Value v bytes with trace compression ?? bytes
  30. 30. XESLite – In-Memory (XL-IM) PAGE 29 Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems + Compression / packing of similar values + Many other optimization possible
  31. 31. XESLite – In-Memory (XL-IM) • Column-store like custom in-memory data structure in Java • No communication overhead with external tools • Assumptions • Fixed-width values for fast access (lookup table for literals – flyweights for free) • Consistent attribute types (i.e., columns types are enforced) • Dynamic memory allocation in (compressed) blocks PAGE 30 Block storing 2 integer values Block storing 8 boolean values Disclaimer: • No real deletion  only mark as delete! • Meta-attributes supported but inefficient! • Spawns a compressor thread!
  32. 32. XESLite – (Embedded) Database (XL-DB) PAGE 31 As XL-IM, a tabular view instead of the object graph of OpenXES MapDB stored as key/value pairs • On-disk storage (mmaped-file) • Uses operating system paging • Caching mechanism for common attributes: • concept:name, • time:timestamp, • lifecycle:transition • Supports all OpenXES functionality! Disclaimer: • No real deletion  only mark as delete! • Spawns a multiple threads! • MMAP files in temp folder might not be deleted!
  33. 33. Benchmark - Memory PAGE 32 Road Fines No difference XL-DB vs XL-IM BPI 2011 vs Hospital Billing
  34. 34. Benchmark - Time PAGE 33 Garbage Coll. No difference? BTree! Random-access implementation detail
  35. 35. Conclusion PAGE 34 • Discussion on requirements • Multi set vs Table • Storage requirements • Three general ideas • Flyweights • Sequential IDs • Compressed Traces • Three XESLite implementations • Automaton (XL-AT) • In-Memory (XL-IM) • Database (XL-DB) • Details in technical report: • BPM Center Report BPM-16-02

×