Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning
1. Department of Business
Information Systems II
Department of Business
Information Systems II
Informed Machine Learning for Improved
Similarity Assessment in
Process-Oriented Case-Based Reasoning
Maximilian Hoffmann1,2 and Ralph Bergmann1,2
1Department of Business Information Systems II
University of Trier, Germany
2German Research Center for Artificial Intelligence (DFKI),
Branch University of Trier, Germany
2. Department of Business
Information Systems II
Introduction & Motivation
• Deep Learning (DL) components within Case-Based Reasoning
(CBR) applications are gaining popularity
• Both components create a synergy:
• DL provides powerful offline learning capabilities for core CBR tasks,
e.g., similarity assessment
• CBR provides, among others, structured knowledge about the case
representation or the definition of similarity measures
• Many current approaches lack a comprehensive integration of the
CBR-provided knowledge into the DL components, resulting in
unused potential for improved quality and performance
• Informed Machine Learning targets this shortcoming
Goal: Investigate the possibilities of Informed ML for similarity learning
in Process-Oriented CBR (POCBR)
- 2 -
3. Department of Business
Information Systems II
Foundations – NEST Graphs
- 3 -
Source: Bergmann, R., Gil, Y.: Similarity assessment and efficient retrieval of semantic workflows.
Information Systems 40, pp. 115–127 (2014)
4. Department of Business
Information Systems II
Foundations – Similarity
Assessment of Semantic Graphs
• Local-global principle by Richter is used to compute 𝑠𝑠𝑠𝑠𝑠𝑠 𝑄𝑄, 𝐶𝐶
between query graph 𝑄𝑄 and case graph 𝐶𝐶:
– Local pairwise similarities between nodes and edges of the same type
are calculated
– Global similarity results from aggregation of local similarities according to
the most similar mapping of nodes and edges
- 4 -
5. Department of Business
Information Systems II
Similarity Learning with Graph
Neural Networks (GNNs)
- 5 -
Source: Hoffmann, M., Malburg, L., Klein, P., Bergmann, R. (2020). Using Siamese Graph Neural Networks for Similarity-Based
Retrieval in Process-Oriented Case-Based Reasoning. In: ICCBR 2020, 12311, pp. 229-244. Springer.
GMN
GEM
6. Department of Business
Information Systems II
Similarity Assessment with Informed
ML Methods
- 6 -
Goal: Improving similarity assessment by means of
increased quality and reduced time effort
GNN
variant
Approach Knowledge source Suited
for
First
Extension
Novel tree-based
encoding procedure for
semantic descriptions
Case representation,
domain model
GEM,
GMN
Second
Extension
Novel constraint-based
matching procedure of
the GMN
Graph-based
similarity measure
GMN
7. Department of Business
Information Systems II
First Extension: Tree-based encoding
• Semantic description is
composed of composite
and atomic parts
• Previous encoding
methods built a sequence
of atomic encodings
• Limitations:
– No encoding of composites
and hierarchical relations
– Sequence processing in
neural networks can be slow
- 7 -
8. Department of Business
Information Systems II
First Extension: Tree-based encoding (2)
• Encoding of semantic
descriptions as tree structures:
– Maintains hierarchical
structure
– Explicitly encodes composite
types
– Enables parallel processing
• Tree structures are processed
by a local GNN:
– Information propagation
between nodes along the
edges
- 8 -
9. Department of Business
Information Systems II
Second Extension: Constraint Matching
• Propagation of GMN is inspired by graph matching
algorithms that determine graph similarities
• Propagation between nodes within a graph is extended to
propagation between graphs
• Problems:
– Graph matching algorithm defines compatible types of nodes
– GMN is currently not able to use this information
- 9 -
10. Department of Business
Information Systems II
Second Extension: Constraint Matching (2)
• Integration of compatible types of nodes into the cross-graph
matching component
• Compatibility is expressed by defining a cosine similarity of 0
for incompatible pairs of nodes
• Propagation of vector information between incompatible pairs
of nodes is stopped
- 10 -
11. Department of Business
Information Systems II
Experimental Evaluation – Setup
• Implementation of our approach in the
open-source CBR system ProCAKE
with use of TensorFlow DL models
• Comparison of base GEM and GMN with extended variants (nine
different combinations of both extensions) for similarity learning
• Two domains:
• CB-I: 800 cooking workflows (660 training, 60 validation, 80 test)
• CB-II: 609 data mining workflows (509 training, 40 validation, 60 test)
• Hyperparameters are tuned for base models in each domain and
used by all model variants
- 11 -
procake.uni-trier.de
Source: Bergmann, R., Grumbach, L., Malburg, L., Zeyen, C. (2019). ProCAKE: A Process-Oriented Case-Based Reasoning
Framework. In: Workshops Proc. of the 27th Int. Conf. on Case-Based Reasoning (ICCBR 2019), 2567, pp. 156–161. CEUR-WS.org
12. Department of Business
Information Systems II
Experimental Results
• Training time (in milliseconds) and MAE is measured for
both domains (CB-I and CB-II)
• GEM and GMN:
• Subscript stands for extension
• Combinations of extension also examined
• Superscript stands for reuse of layers for extension 1
- 12 -
13. Department of Business
Information Systems II
Experimental Results (2)
• Quality:
• No significant positive influence of extensions on GEM
• Base model shows best quality for GEM
• For CB-II, quality is even reduced
• GMN shows quality improvements with use of extensions
• Max. decrease in MAE of approx. 10%
• Effects are not consistent across both domains and all variants
- 13 -
14. Department of Business
Information Systems II
Experimental Results (3)
• Training time:
• Positive influence of extensions on GEM (decrease of up to 23%)
• Also negative effects occur (increase of up to 44%)
• GMN shows similar results as GEM
• Max. decrease in training time of approx. 24%
• Effects are not consistent across both domains and all variants
• Some extensions decrease training time and MAE (e.g., GMN1
R)
- 14 -
15. Department of Business
Information Systems II
Experimental Results – Discussion
• Effects of extensions on quality or training time are very inconsistent
(dependent on domain and model)
• Careful choice of extensions and tuning of hyperparameters
inevitable
• Individual optimization of extended variants might further improve the
shown results
- 15 -
16. Department of Business
Information Systems II
Conclusion & Future Work
• Presented extensions introduce a form of Informed ML for
DL models in POCBR
• Extended GNN models are capable of prediction errors
and training time
• Whether a benefit can be achieved, depends on the
target domain and the underlying model architecture
• Future work:
– More in-depth analysis of Informed ML in the context of CBR with
challenges, opportunities, and guidelines
– Examination and evaluation of Informed ML methods for other
phases of the CBR cycle such as the reuse phase
- 16 -
17. Department of Business
Information Systems II
Department of Business
Information Systems II
Contact
- 17 -
Maximilian Hoffmann
Business Information Systems II
University of Trier, Germany
German Research Center for Artificial
Intelligence (DFKI),
Branch University of Trier, Germany
hoffmannm@uni-trier.de
maximilian.hoffmann@dfki.de
procake.uni-trier.de