Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning

Department of Business
Information Systems II
Informed Machine Learning for Improved
Similarity Assessment in
Process-Oriented Case-Based Reasoning
Maximilian Hoffmann1,2 and Ralph Bergmann1,2
1Department of Business Information Systems II
University of Trier, Germany
2German Research Center for Artificial Intelligence (DFKI),
Branch University of Trier, Germany

Introduction & Motivation
• Deep Learning (DL) components within Case-Based Reasoning
(CBR) applications are gaining popularity
• Both components create a synergy:
• DL provides powerful offline learning capabilities for core CBR tasks,
e.g., similarity assessment
• CBR provides, among others, structured knowledge about the case
representation or the definition of similarity measures
• Many current approaches lack a comprehensive integration of the
CBR-provided knowledge into the DL components, resulting in
unused potential for improved quality and performance
• Informed Machine Learning targets this shortcoming
Goal: Investigate the possibilities of Informed ML for similarity learning
in Process-Oriented CBR (POCBR)
- 2 -

Foundations – NEST Graphs
- 3 -
Source: Bergmann, R., Gil, Y.: Similarity assessment and efficient retrieval of semantic workflows.
Information Systems 40, pp. 115–127 (2014)

Foundations – Similarity
Assessment of Semantic Graphs
• Local-global principle by Richter is used to compute 𝑠𝑠𝑠𝑠𝑠𝑠 𝑄𝑄, 𝐶𝐶
between query graph 𝑄𝑄 and case graph 𝐶𝐶:
– Local pairwise similarities between nodes and edges of the same type
are calculated
– Global similarity results from aggregation of local similarities according to
the most similar mapping of nodes and edges
- 4 -

Similarity Learning with Graph
Neural Networks (GNNs)
- 5 -
Source: Hoffmann, M., Malburg, L., Klein, P., Bergmann, R. (2020). Using Siamese Graph Neural Networks for Similarity-Based
Retrieval in Process-Oriented Case-Based Reasoning. In: ICCBR 2020, 12311, pp. 229-244. Springer.
GMN
GEM

Similarity Assessment with Informed
ML Methods
- 6 -
Goal: Improving similarity assessment by means of
increased quality and reduced time effort
GNN
variant
Approach Knowledge source Suited
for
First
Extension
Novel tree-based
encoding procedure for
semantic descriptions
Case representation,
domain model
GEM,
GMN
Second
Extension
Novel constraint-based
matching procedure of
the GMN
Graph-based
similarity measure
GMN

First Extension: Tree-based encoding
• Semantic description is
composed of composite
and atomic parts
• Previous encoding
methods built a sequence
of atomic encodings
• Limitations:
– No encoding of composites
and hierarchical relations
– Sequence processing in
neural networks can be slow
- 7 -

First Extension: Tree-based encoding (2)
• Encoding of semantic
descriptions as tree structures:
– Maintains hierarchical
structure
– Explicitly encodes composite
types
– Enables parallel processing
• Tree structures are processed
by a local GNN:
– Information propagation
between nodes along the
edges
- 8 -

Second Extension: Constraint Matching
• Propagation of GMN is inspired by graph matching
algorithms that determine graph similarities
• Propagation between nodes within a graph is extended to
propagation between graphs
• Problems:
– Graph matching algorithm defines compatible types of nodes
– GMN is currently not able to use this information
- 9 -

Second Extension: Constraint Matching (2)
• Integration of compatible types of nodes into the cross-graph
matching component
• Compatibility is expressed by defining a cosine similarity of 0
for incompatible pairs of nodes
• Propagation of vector information between incompatible pairs
of nodes is stopped
- 10 -

Experimental Evaluation – Setup
• Implementation of our approach in the
open-source CBR system ProCAKE
with use of TensorFlow DL models
• Comparison of base GEM and GMN with extended variants (nine
different combinations of both extensions) for similarity learning
• Two domains:
• CB-I: 800 cooking workflows (660 training, 60 validation, 80 test)
• CB-II: 609 data mining workflows (509 training, 40 validation, 60 test)
• Hyperparameters are tuned for base models in each domain and
used by all model variants
- 11 -
procake.uni-trier.de
Source: Bergmann, R., Grumbach, L., Malburg, L., Zeyen, C. (2019). ProCAKE: A Process-Oriented Case-Based Reasoning
Framework. In: Workshops Proc. of the 27th Int. Conf. on Case-Based Reasoning (ICCBR 2019), 2567, pp. 156–161. CEUR-WS.org

Experimental Results
• Training time (in milliseconds) and MAE is measured for
both domains (CB-I and CB-II)
• GEM and GMN:
• Subscript stands for extension
• Combinations of extension also examined
• Superscript stands for reuse of layers for extension 1
- 12 -

Experimental Results (2)
• Quality:
• No significant positive influence of extensions on GEM
• Base model shows best quality for GEM
• For CB-II, quality is even reduced
• GMN shows quality improvements with use of extensions
• Max. decrease in MAE of approx. 10%
• Effects are not consistent across both domains and all variants
- 13 -

Experimental Results (3)
• Training time:
• Positive influence of extensions on GEM (decrease of up to 23%)
• Also negative effects occur (increase of up to 44%)
• GMN shows similar results as GEM
• Max. decrease in training time of approx. 24%
• Effects are not consistent across both domains and all variants
• Some extensions decrease training time and MAE (e.g., GMN1
R)
- 14 -

Experimental Results – Discussion
• Effects of extensions on quality or training time are very inconsistent
(dependent on domain and model)
• Careful choice of extensions and tuning of hyperparameters
inevitable
• Individual optimization of extended variants might further improve the
shown results
- 15 -

Conclusion & Future Work
• Presented extensions introduce a form of Informed ML for
DL models in POCBR
• Extended GNN models are capable of prediction errors
and training time
• Whether a benefit can be achieved, depends on the
target domain and the underlying model architecture
• Future work:
– More in-depth analysis of Informed ML in the context of CBR with
challenges, opportunities, and guidelines
– Examination and evaluation of Informed ML methods for other
phases of the CBR cycle such as the reuse phase
- 16 -

Contact
- 17 -
Maximilian Hoffmann
Business Information Systems II
University of Trier, Germany
German Research Center for Artificial
Intelligence (DFKI),
Branch University of Trier, Germany
hoffmannm@uni-trier.de
maximilian.hoffmann@dfki.de
procake.uni-trier.de

Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning

Similar to Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning (20)

Recently uploaded

Recently uploaded (20)

Informed Machine Learning for Improved Similarity Assessment in Process-Oriented Case-Based Reasoning