Symbolic Background Knowledge for Machine Learning

IPVS – Institute for Parallel and Distributed Systems
Analytic Computing
Symbolic Background Knowledge
for Machine Learning
Steffen Staab
https://www.ipvs.uni-stuttgart.de/departments/ac/
With slides contributed by
Alexandra Baier, Luis Chamon, Tim Schneider, Bo Xiong, Thomas Monninger

• What is machine learning?
• Why (symbolic) background knowledge in machine learning?
• Which background knowledge?
• In which applications?
• How to apply in machine learning?
•A broad range of methods
for applying background knowledge in ML
So, background knowledge helps, but....

Classification: predict discrete label for given examples
• Given a news article, assign a topic
Regression: predict continuous value for given examples
• Given meteorological conditions today, predict temperature tomorrow
Sequence Prediction: predict sequence for given sequence
• Given text, identify all noun phrases in the text
Supervised Learning
Machine Learning
Model
lore ipsum
dolores
lore ipsum
lore ipsum
dolores
lore ipsum
dolores
lore ipsum
Politics Culture Lifestyle Sports
0
From KnowGraphs Winter School 2021

Unsupervised Learning
Clustering: summarize similar examples in clusters
• Given articles, form k clusters of most similar articles
Visualization and Dimensionality reduction: map high-
dimensional data into lower-dimensional space
• Map documents in d-dimensional space
such that similar documents are close
Rule Mining: learn general rules from data
• Given gene network, find rules about frequent relationships
Machine Learning
Model
lore ipsum
dolores
lore ipsum
lore ipsum
dolores
lore ipsum
dolores
lore ipsum

Supervised vs. Unsupervised
Many supervised methods now include some form of unsupervised learning:
• Word embedding layers: can encode interesting linguistic patterns
• Convolutional layers: can encode interesting visual or sequential patterns
One but last layer
encodes learned
features:
• embeddings
• implicit, re-usable,
non-symbolic
background knowledge

Parameters 𝜽
Loss-based Machine Learning
• Define loss function that evaluates errors of a model wrt training data:
• Training data: 𝒟 = 𝑥𝑛, 𝑦𝑛 𝑛=1
𝑁
, 𝑥𝑛 ∈ ℝ𝑑, 𝑦𝑛 ∈ ℝ
• Adjust model parameters 𝜃 to minimize empirical risk
min
𝜽
1
𝑁
𝑛=1
𝑁
Loss 𝑓𝜽 𝒙𝑛 , 𝑦𝑛
Machine Learning
Model 𝑓𝜽
Data
{(𝑥𝑛, 𝑦𝑛)}
Output 𝑓𝜽 𝒙𝒏

Training 𝑓𝜽: Gradient-Based Adaptation of 𝜽
Minimize 𝒏=𝟏
𝑵
𝐋𝐨𝐬𝐬 𝒇𝜽 𝒙𝒏 , 𝒚𝒏 locally:
1. initialize parameters 𝜃 randomly,
2. change parameters 𝜃 in the direction
of the negative gradient,
3. repeat until a local minimum is reached

• Training data: 𝒟 = 𝑥𝑛, 𝑦𝑛 𝑛=1
𝑁
, 𝑥𝑛 ∈ ℝ, 𝑦𝑛 ∈ ℝ
• Linear function: 𝑓𝜃(x) = 𝜃1𝑥 + 𝜃0
• Loss function
𝐿𝑜𝑠𝑠 𝑦, 𝑦 = 𝑦 − 𝑦 2
• Minimization of
empirical risk:
min
𝜽
1
𝑁 𝑛=1
𝑁
𝑓𝜽 𝒙𝑛 − 𝑦𝑛
2
12.03.2023 8
Example: Linear Regression

Why (Symbolic)
Background Knowledge?
Slid
e 9

Classify unseen data
Zero shot learning
No training data about unicorns
Background knowledge
Unicorn is a horse with a horn
Classify unseen data
Few shot learning
one/few training examples
Sparse data

General differential equation for water flux
𝜕𝑢
𝜕𝑡
= 𝐷(𝑢)
𝜕2𝑢
𝜕𝑥2
− 𝑣(𝑢)
𝜕𝑢
𝜕𝑥
+ 𝑞(𝑢)
Approximate for all finite volumes 𝑖.
Flux kernel ℱ𝑖
ℱ𝑖 = 𝑗=1
𝑁𝑠𝑖
𝑓𝑗 ≈ ∮ 𝜔⊆Ω 𝐷(𝑢)
𝜕2
𝑢
𝜕𝑥2
− 𝑣(𝑢)
𝜕𝑢
𝜕𝑥
⋅ 𝑛𝑑Γ
Simulation: numerically solve for all ℱ𝑖
exchange with neighbors given boundary
conditions
ML alternative (Karlbauer et al 2022):
1. learn behavior of finite volumes
2. interleave with numerical solving
Learning physics
Picture (cc) by Shu, L., Ullrich, P. A., Duffy, C. J. (2020)
in Geosci. Model Dev.:
https://gmd.copernicus.org/articles/13/2743/2020/

ML alternative (Karlbauer et al 2022):
1. learn behavior of finite volumes
2. interleave with numerical solving
Background knowledge:
• water does not appear or disappear
• sum of amount of water is constant
• energy must be constant
• water cannot rise arbitrarily
• bedrock does not allow entry of water
• ...
Learning physics
Picture (cc) by Shu, L., Ullrich, P. A., Duffy, C. J. (2020) in Geosci. Model Dev.:
https://gmd.copernicus.org/articles/13/2743/2020/#&gid=1&pid=1

Given
• Perception
• Tracked dynamic agents
• Background knowledge
High definition map
• Lane topology
• Infrastructure
• Semantic information
• Predict: Intention of others, sensing mistakes
Traffic Scene Understanding
13

12.03.2023
Steffen Staab, Universität Stuttgart, @ststaab, https://www.ipvs.uni-stuttgart.de/departments/ac/ 14
Knowledge Representations in Informed ML
(von Rueden et al 2021)
We will look at
several of them
and some others
Pre-trained non-
symbolic models
are missing here

12.03.2023 15
Influencing ML by Empirical Risk Minimization
min
𝜽
1
𝑁
𝑛=1
𝑁
Given training data 𝒟 = 𝑥𝑛, 𝑦𝑛 𝑛=1
𝑁
, 𝑥𝑛 ∈ ℝ𝑑
, 𝑦𝑛 ∈ ℝ
min
𝜽
1
𝑁
𝑛=1
𝑁
min
𝜽
1
𝑁
𝑛=1
𝑁
Loss 𝑓𝜽 𝒙𝑛 , 𝑦𝑛 min
𝜽
1
𝑁
𝑛=1
𝑁
Specific algorithms often modify several elements at once

12.03.2023
A pipeline view of the same consideration
(von Rueden et al 2021)

Knowledge-informed
Data Modification
12.03.2023
min
𝜽
1
𝑁
𝑛=1
𝑁

Text Clustering
Clustering
Similarity
Measure
Document Representation
Background
Knowledge
Explanation
(Hotho, Staab, Stumme 2003a;2003b)

Slid
e
Ontology-based representation
2
1
1
1
2
1
1
1
..
.
Oman
granted
term
crude
oil
lipid
compound
customer
retroactive
discount
...
2
1
1
1
2
2
2
1
1
1
..
.
Cosinus as similarity metrics
Oman
has
granted
term
crude
oil
customers
retroactive
discounts
...
2
1
1
1
1
2
1
1
1
..
.
Oman
granted
term
crude
oil
customer
retroactive
discount
...
1 2 3

WordNet as ontology
144684 lexical
entries
Root
entity
something
physical object
artifact
substance
chemical
compound
organic
compound
lipid
oil
EN:oil
covering
coating
paint
oil paint
cover
cover with oil
bless
oil, anoint
EN:anoint EN:inunct
109377 Concepts
(synsets)
oil color
crude oil

• Kernel methods as data augmentation
• 𝑥 ↦ 𝑥2, 𝑥, 1 = 𝑥′
• Linear regression: 𝑓𝜃 𝑥 = 𝜃1𝑥1 + 𝜃0 = 𝜃𝑥
• Quadratic regression: 𝑓𝜃′ 𝑥′ = 𝜃1
′
𝑥2 + 𝜃1
′
𝑥 + 𝜃0
′
= 𝜃0
′
𝑥′
• Kernel trick in many methods (e.g. SVMs)
• explicit data augmentation not required, but built into loss function
• kernel defines similarity between data points
12.03.2023
Kernel Methods (Bloehdorn & Hotho 2004; 2009)

Knowledge-informed
Loss Definition
12.03.2023
min
𝜽
1
𝑁
𝑛=1
𝑁

TBox statements can be normalized into:
1. concept subsumption: 𝐶 ⊑ 𝐷,
2. concept intersection: 𝐶1 ⊓ 𝐶2 ⊑ 𝐷,
3. right existential: ∃𝑟. 𝐶1 ⊑ 𝐷,
4. left existential: 𝐶1 ⊑ ∃𝑟. 𝐶2,
EL++ Knowledge Bases
23
Baader, F., Brandt, S., & Lutz, C. Pushing the EL envelope. In IJCAI 2005.
ABox contains:
1. concept assertion: C(a)
2. role assertion: r(a,b)
• EL++ is a lightweight description logic that
• balances expressive power and reasoning complexity (polynomial)
• is applied for large-scale ontologies, e.g. Gene Ontology

Box EL++ Embedding
2
4
• designing one loss term for each logical statement
• such that the KB is satisfiable when the loss is 0 (a.k.a. soundness)
• satisfiability implies that there is a geometric interpretation satisfying all
logical statements
Idea: mapping logical constraints to geometric (soft) constraints
Solution: finding a geometric interpretation for each statement

Geometric Interpretations of ABox
2
5
Concept assertion 𝐶(𝑎)
𝑎
𝐶
r(𝑎, 𝑏)
𝑇𝑟
𝑏
𝑎
Role assertion
Geometric membership
Affine transformation
between two points

Geometric Interpretations of TBox
2
6
Box affine
transformation
Box entailment Box intersection
Box disjointedness

• Super important and successful class of ML methods for
supporting scientific problem solving
• 3 pages, short easy read:
• Chris Edwards. 2022. Neural networks learn to speed up simulations.
Commun. ACM 65, 5 (May 2022), 27–29. https://doi.org/10.1145/3524015
• A bit longer:
• Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S. and
Yang, L., ”Physics-informed machine learning,” in: Nature Reviews
Physics, 3(6), 2021, pp.422-440.
• I skip it now for lack of time
28
Physics-informed Neural Networks

Knowledge-informed
Inductive Biases
12.03.2023
min
𝜽
1
𝑁
𝑛=1
𝑁
Geometric Biases

Structured Multilabel Prediction
3
0
• Structured multilabel prediction
• assigns every instance multiple labels
• Labels are constrained by some background knowledge
• Question: can we produce predictions that are logically consistent with
structured background knowledge?

Structured background knowledge
3
1
• Labels are organized in taxonomies/hierarchies/ontologies
• 2 kinds of logical constraints
• implication: rdfs:subclassOf
• exclusion: owl:disjointFrom
WordNet Biomedical ontologies

Plant
Person
Mother
Father
Issues
• Mother classifier misclassifies
people (solvable with kernel trick)
• Boundaries are unrelated
32
Try support vector machines
Women Girl
Person
Parent
Mother
Implication Exclusion
Plant
Tree
Father

Inductive bias in Poincaré hyperbolic space
3
3
• Each label is associated with a region contained by Poincaré hyperplane
• Instances are points inside the region
• Logical constraints of labels are transformed into
geometric soft constraints of the corresponding label regions
Women Girl
Person
Parent
Mother
Implication Exclusion
Plant
Tree
Father

Geometric Interpretation
3
4
Implication / rdfs:subclassOf is geometrically interpreted as
the subject region being inside the object region

Classification
3
6
• Classification is interpreted as membership & non-membership
• Final classifier

Learning with geometric soft constraints
3
7
• Problem formulation: constraint optimization
• Learning under geometric soft constraints

Knowledge-informed
Inductive Biases
12.03.2023
min
𝜽
1
𝑁
𝑛=1
𝑁
Mapping full to limited
learning capacity

12.03.2023
Use case: Multi-step prediction of trajectory prediction
Critical issues
Does the neural network make
catastrophic predictions
on unseen data?

• 𝑓𝜽 = 𝑔𝜃1
∘ σ ∘ MLP𝜃2
• MLP𝜃2
: ℝ𝑑1 → ℝ𝑑2
• arbitrary mapping from input to output
• full learning capacity
• σ
• component-wise sigmoid
• bounds output of MLP𝜃2
• 𝑔𝜃1
• predicting with background restrictions
Pattern: From full capacity to constrained learning
min
𝜽
1
𝑁
𝑛=1
𝑁

Linear system
𝒙𝑡 = 𝑨𝒙𝑡−1 + 𝑩𝒖𝑡
Next system state 𝒙𝑡
(e.g. velocity of ship)
depends linearly
on previous state 𝒙𝑡−1
and control 𝒖𝑡 (e.g. force)
Switched linear system
𝒙𝑡 = 𝑨𝜎(𝑡)𝒙𝑡−1 + 𝑩𝜎(𝑡)𝒖𝑡
𝜎(𝑡) chooses between
different matrices 𝑨𝜎(𝑡), 𝑩𝜎(𝑡)
12.03.2023
System identification

ReLiNet: Stable and Explainable Multistep Prediction
LSTM
ut
+
xt ut
Bt
At xt+1
W(A)
W(B)
ht
ut ht xt W(.)
Legend:
control
input
hidden
state
output
hidden state
to predicted matrix
1. LSTM predicts a linear model
at each time step
2. Prediction only depends
on linear models
→ Inherently explainable
3. ReLiNet is a
Switched Linear System
→ Stability guarantees
with simple constraints
(Baier et al 2023)

Explaining Predictions
Contribution of past
inputs to prediction
Explanation allows faithful
reconstruction of prediction
Ft,d = Feature weight
at time t for output d
ut = Input at time t

• 𝑓𝜽 = 𝑔𝜃1
∘ σ ∘ MLP𝜃2
• MLP𝜃2
: ℝ𝑑1 → ℝ𝑑2
• arbitrary mapping from input to output
• full learning capacity, maps entities and concepts
• σ
• component-wise sigmoid
• bounds output of MLP𝜃2
• 𝑔𝜃1
• encodes background restrictions
12.03.2023
Falcon applying the pattern
(Tang et al 2023)
min
𝜽
1
𝑁
𝑛=1
𝑁

Input ALC A-Box and T-Box
1. ℝ𝑑 is the domain of interpretations
2. 𝑓𝑒 is learned to interpret concept, relation and object
names as elements of ℝ𝑑
3. E.g. Check whether an object belongs to a class by 𝑚:
𝑚 𝑥, 𝐶ℐ = 𝜎(𝑀𝐿𝑃 𝑓𝑒 𝐶 , 𝑓𝑒 𝑥 )
4. Exemplary part of the loss function: 𝐸 is sampled from ℝ𝑑
ℒ𝒯 =
1
|𝐸|
1
|𝒯| 𝐶⊑𝐷∈𝒯 𝑒∈𝐸 𝑚 𝑒, (𝐶 ⊓ ¬𝐷)ℐ
12.03.2023 45
Falcon: Faithful Neural Semantic Entailment over
𝓐𝓛𝓒 Ontologies

Knowledge-informed
Optimization
12.03.2023
min
𝜽
1
𝑁
𝑛=1
𝑁

12.03.2023
Slide by Luis Chamon 47
From loss regularization to hard constraints

12.03.2023
Penalty-based vs. Dual Learning

12.03.2023
Learning under hard constraints in practice

Knowledge-informed Machine
Learning
to Extend Scientific Knowledge
Source: generated by Tim Schneider using Stable Diffusion AI

Use Case: Environmental Science
sorption
sorptive
concentration c
(mg/L)
background
knowledge
sorbate
concentration s
(mg/Kg)

Output
Input
Knowledge-informed Machine Learning to
Extend Scientific Knowledge
Algebraic
Equations /
Analytical
Forms
Bayesian
Machine
Scientist
Training Data
Domain Knowledge
- class of expressions
- symmetries
- …
1
2
Research Questions:
„How to represent (RQ1)
and automatically exploit
(RQ2) scientific knowledge
for inference?“
RQ1
RQ2
e.g.
symbolic structure and
continous parameters

How to represent and exploit the knowledge?
Representation (RQ1)
1
How likely does explored
solution conform to the
knowledge?
Scientific Domain
Knowledge
Probabilistic Regular
Tree Expression
(pRTE)
2
Factor Graph
3
Build a probabilistic
finite state machine
Exploitation (RQ2)
4
(Prior) Probabilies to
perform Bayesian
Inference (MCMC)

Evaluation
sorption
sorptive
concentration c
(mg/L)
background
knowledge
sorbate
concentration s
(mg/Kg)

Relational Graph
Neural Networks
12.03.2023

12.03.2023
Ontology of graph neural network for traffic scene
understanding (Schmidt et al 2023)

Message passing in graph neural networks
• Every instantiated node is randomly initialized with a vector
• Every instantiated node sends his vector to his neighbors
• Every instantiated node learns how to aggregate neighbor information
• Parameter sharing over same node types and edge types
57
Car 1
Car 2
Car 3
Lane B
Lane A
Stop
7

Limitations
and
Outlook
12.03.2023

• Approaches often modify
several aspects
• The focus of my presentation was on work that was done at
Analytic Computing@University of Stuttgart
• For practical reasons:
• not doing comprehensive literature study
• but our papers point to other papers
• Existing research on knowledge-informed ML goes much
beyond this
12.03.2023
Limitations
min
𝜽
1
𝑁
𝑛=1
𝑁

• From knowledge-informed to knowledge discovery
• From interpolation to extrapolation
• including few and zero-shot learning
• From implicit knowledge to explicit knowledge
• including self-learning
• including prototypical knowledge
• From classification to non-standard queries
• similarity
• analogy
12.03.2023
Outlook on Knowledge-informed ML

Thank you!
E-Mail
Telefon +49 (0) 711 685-
www.
Universität Stuttgart
IPVS
Universitätsstraße 32, 50569 Stuttgart
Steffen Staab
To be defined
ipvs.uni-stuttgart.de/departments/ac/
Analytic Computing, IPVS
Steffen.staab@ipvs.uni-stuttgart.de
My thanks and acknowledgements to all my collaborators
within and beyond Analytic Computing, within and beyond
EXC Simtech in particular:
Alexandra Baier, Daniel Frank, Cosimo Gregucci, Daniel
Hernandez, Wolfgang Nowak, Nico Potyka, Tim
Schneider, Amin Totounferoush, Bo Xiong, Thomas
Monninger, Julian Schmidt, and many more

References
Surveys
• Hu, Y., Chapman, A., Wen, G. and Hall, D.W., ”What Can Knowledge Bring
to Machine Learning? A Survey of Lowshot Learning for Structured Data,”
in ACM Transactions on Intelligent Systems and Technology (TIST), 13(3),
2021, pp.1-45.
• von Rueden, Laura, et al. ”Informed Machine Learning-A Taxonomy and
Survey of Integrating Prior Knowledge into Learning Systems.” IEEE
Transactions on Knowledge and Data Engineering (2021).
Factual knowledge
• J. Schmidt, T. Monninger, J. Rupprecht, D. Raba, J. Jordan, D. Frank, S.
Staab, K. Dietmayer. SCENE: Reasoning about Traffic Scenes using
Heterogeneous Graph Neural Networks. In: IEEE Robotics and Automation
Letters, 8(3), 2023.

References: Physics-informed Neural Networks
• Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S. and
Yang, L., ”Physics-informed machine learning,” in: Nature Reviews
Physics, 3(6), 2021, pp.422-440.
• Chris Edwards. 2022. Neural networks learn to speed up simulations.
Commun. ACM 65, 5 (May 2022), 27–29.
https://doi.org/10.1145/3524015
• Matthias Karlbauer, Timothy Praditia, Sebastian Otte, Sergey
Oladyshkin, Wolfgang Nowak, Martin V. Butz: Composing Partial
Differential Equations with Physics-Aware Neural Networks. ICML 2022:
10773-10801
12.03.2023

References: Inductive Biases
• A. Baier, D. Aspandi-Latif, S. Staab. ReLiNet: Stable and Explainable Multistep Prediction with
Recurrent Linear Parameter Varying Networks. Unpublished/submitted 2023.
• Zhenwei Tang, Tilman Hinnerichs, Xi Peng, Xiangliang Zhang, Robert Hoehndorf. FALCON: Faithful
Neural Semantic Entailment over ALC Ontologies. https://arxiv.org/abs/2208.07628
Geometric inductive biases
• B. Xiong, S. Zhu, M. Nayyeri, C. Xu, S. Pan, C. Zhou, S. Staab. Ultrahyperbolic Knowledge Graph
Embeddings. In: Proc. Of KDD ’22 – The 28th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining. East Lansing, MI, USA, August 14-18, 2022.
• B. Xiong, N. Potyka, T.-K. Tran, M. Nayyeri, S. Staab. Faithful Embeddings for EL++ Knowledge
Bases. In: 21st International Semantic Web Conference (ISWC2022), (virtual event) Nov, 2022,
Springer 2022.
• B. Xiong, S. Zhu, N. Potyka, S. Pan, C. Zhou, S. Staab. Pseudo-Riemannian Graph Convolutional
Networks. In: Proc of 36th Conference on Neural Information Processing Systems (NeuRIPS2022),
Nov 28-Dec 9, 2022.
• B. Xiong, M. Cochez, M. Nayyeri, S. Staab. Hyperbolic Embedding Inference for Structured Multi-Label
Prediction. In: Proc of 36th Conference on Neural Information Processing Systems (NeuRIPS2022),
Nov 28-Dec 9, 2022.
12.03.2023

References
Data augmentation
• A. Hotho, S. Staab, G. Stumme. Ontologies Improve Text Document Clustering. Proceedings of the
International Conference on Data Mining – ICDM-2003. IEEE Press, 2003a.
• A. Hotho, S. Staab, G. Stumme. Explaining Text Clustering Results using Semantic Structures. In
Principles of Data Mining and Knowledge Discovery, 7th European Conference, PKDD 2003,
Dubrovnik, Croatia, September 22-26, 2003b.
• Stephan Bloehdorn, Andreas Hotho: Text Classification by Boosting Weak Learners based on Terms
and Concepts. ICDM 2004: 331-334
• Stephan Bloehdorn, Andreas Hotho: Ontologies for Machine Learning. Handbook on Ontologies 2009:
637-661
Optimization
• T. Schneider, A. Totounferoush, W. Nowak, S. Staab. Probabilistic Regular Tree Priors for Scientific
Symbolic Reasoning. Unpublished/submitted 2023. (grammar-induced optimization)
• L. F. O. Chamon, S. Paternain, M. Calvo-Fullana and A. Ribeiro, "Constrained Learning With Non-
Convex Losses," in IEEE Transactions on Information Theory, vol. 69, no. 3, pp. 1739-1760, March
2023, doi: 10.1109/TIT.2022.3187948.
12.03.2023

Symbolic Background Knowledge for Machine Learning

Recommended

Recommended

More Related Content

Similar to Symbolic Background Knowledge for Machine Learning

Similar to Symbolic Background Knowledge for Machine Learning (20)

More from Steffen Staab

More from Steffen Staab (20)

Recently uploaded

Recently uploaded (20)

Symbolic Background Knowledge for Machine Learning