AI alignment from the Active Inference perspective 2023.pdf

AI alignment from the perspective of
Active Inference
Roman Leventov
г.Москва, 22-23 апреля 2023 г.
Научно-практическая конференция
"Современная системная инженерия и менеджмент"

Free Energy Principle: physical modelling basics
The FEP formalism assumes that the world is modelled as a set of
variables x that comprise a random dynamical system1
, in discrete or
continuous time:
x'(t) = f(x, t) + w(t),
Where x' is the rate of change of variables’ states, f is state-dependent
function (flow), and w is noise.
1. Friston, K., Da Costa, L., Sakthivadivel, D. A. R., Heins, C., Pavliotis, G. A., Ramstead, M., & Parr, T. (2022).
Path integrals, particular kinds, and strange things (arXiv:2210.12761). arXiv. http://arxiv.org/abs/2210.12761

Free Energy Principle basics: sparse coupling conjecture
A system is (approximately) causally separated from the environment
between t0
and now. μ are internal states, s are sensory states, a are
active states, b = (s, a) are boundary states, η are external states.
Illustration from Friston, K. (2019). A free energy principle for a particular physics (arXiv:1906.10184). arXiv.
http://arxiv.org/abs/1906.10184

FEP: path integral formulation (path-tracking dynamics)
Semantics are only associated with physical dynamics rather than static
physical states1
. Semantics = commuting mapping from physical objects to
mathematical objects.
μt
, bt
, ηt
are paths (trajectories) of states, i.e., physical dynamics.
∀ bt
: ∃ p(ηt
| bt
), a conditional density, μt
is the path of least action of internal
states ⇒ ∃q: μt
→ p(ηt
| bt
), semantic mapping from the path of internal
system states to beliefs about external state trajectories (a mathematical
object)2
.
VFE lemma2
: system state dynamics can be seen as a form of Bayesian
inference of q(ηt
), a variational density over external paths, wrt. some prior
and evidence bt
. ⇒ duality of physical and belief (mathematical)
dynamics (“Bayesian mechanics”)3
1. Fields, C., Friston, K., Glazebrook, J. F., & Levin, M. (2022). A free energy principle for generic quantum systems.
Progress in Biophysics and Molecular Biology, 173, 36–59. https://doi.org/10.1016/j.pbiomolbio.2022.05.006
2. Friston, K., Da Costa, L., Sakthivadivel, D. A. R., Heins, C., Pavliotis, G. A., Ramstead, M., & Parr, T. (2022). Path
integrals, particular kinds, and strange things (arXiv:2210.12761). arXiv. http://arxiv.org/abs/2210.12761
3. Ramstead, M. J. D., Sakthivadivel, D. A. R., Heins, C., Koudahl, M., Millidge, B., Da Costa, L., Klein, B., & Friston,
K. J. (2023). On Bayesian Mechanics: A Physics of and by Beliefs (arXiv:2205.11543). arXiv.
https://doi.org/10.48550/arXiv.2205.11543

Three important assumptions, or “moves”
Generalisation: q(ηt
) encodes beliefs about the present, not the future, but
we assume that smart systems decompose their beliefs into facts (current
state of the world) + generative model (e.g., scientific laws)
Assuming that systems “use” q(η) to “choose” their next action to minimise
expected free energy (~ integral of future surprise), i.e., perform Active
Inference, is induction (if the system is a black box), unless systems are
explicitly designed1
to do this or proven to explicitly do this.
Meta-theoretical move2
: assuming that scientists (observers) observe
themselves as Active Inference systems “reifies” FEP as the basis of
semantics and rationality (i.e., a form of Bayesian epistemology, Deutsch
disapproves)
1. Friston et al. (2022). Designing Ecosystems of Intelligence from First Principles (arXiv:2212.01354). arXiv.
2. Ramstead, M. J. D., Sakthivadivel, D. A. R., & Friston, K. J. (2022). On the Map-Territory Fallacy Fallacy
(arXiv:2208.06924). arXiv. http://arxiv.org/abs/2208.06924

Active Inference: against goals (objectives)
Active Inference system’s behaviour is caused (generated) by its
beliefs q(η) rather than its goals.
“Goals” appear only as future world states on highly predicted trajectories that
the system reflexively notices and records in memory to save computations in
the future. But even if thus recorded, goals remain in principle ephemeral and
discardable at any iteration in the active inference cycle (= OODA cycle).
See also: flaneuring (Taleb), open-endedness (Stanley & Lehman), lean
(Ries), etc., https://ailev.livejournal.com/1254147.html
⇒ Align beliefs instead of “specifying” goals. (Applies to alignment
between any intelligent systems on the same or different system levels, not
just to human–AI alignment. Cf. “managing with context, not control”.)

Definition of alignment
Informally: alignment is learning about each other, i.e., increasing mutual
capacity for predicting (signals from) each other.
FEP (with reference frame): Alignment is a physical interaction process (=
information exchange1
) between two systems during which their internal
dynamics entail belief structures (or update their prior beliefs, from from
their own perspectives) which decompose into causal generative
models with smaller transformation error2
(caveat: acyclic graphs only)
and the fact beliefs (current world states) that are closer after causal
model transformation wrt. some distance measure (KL/JS divergence?).
Quantum FEP (w/o RF): quantum RF alignment across holographic
screen1
= entanglement.
1. Fields, C., Friston, K., Glazebrook, J. F., & Levin, M. (2022). A free energy principle for generic quantum
systems. Progress in Biophysics and Molecular Biology, 173, 36–59.
https://doi.org/10.1016/j.pbiomolbio.2022.05.006
2. Rischel, E. F., & Weichwald, S. (2021). Compositional abstraction error and a category of causal models.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 1013–1023.
https://proceedings.mlr.press/v161/rischel21a.html

Learning and aligning full world models is intractable
While AI architecture could be chosen to explicitly include a world
model1,2,3
, the architecture of human intelligence couldn’t be chosen!
Discovering large causal graphs is extremely expensive: the search space
size grows as 2d*d
, where d is the number of variables4
.
Humans (and other universal intelligences) learn many “local”, incoherent
models, which they select contextually5
. Monolithic q(η) doesn’t exist.
Solution: design belief sharing (communication) protocols1
and learning
environments that foster world model alignment without explicitly tracking
them.
1. Friston et al. (2022). Designing Ecosystems of Intelligence from First Principles (arXiv:2212.01354). arXiv.
2. LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence.
3. Zhou, G., Yao, L., Xu, X., Wang, C., Zhu, L., & Zhang, K. (2023). On the Opportunity of Causal Deep Generative
Models: A Survey and Future Directions (arXiv:2301.12351). arXiv. https://doi.org/10.48550/arXiv.2301.12351
4. Atanackovic, L., Tong, A., Hartford, J., Lee, L. J., Wang, B., & Bengio, Y. (2023). DynGFN: Bayesian Dynamic Causal
Discovery using Generative Flow Networks (arXiv:2302.04178). arXiv. https://doi.org/10.48550/arXiv.2302.04178
5. Fields, C., & Glazebrook, J. F. (2022). Information flow in context-dependent hierarchical Bayesian inference. Journal
of Experimental & Theoretical Artificial Intelligence, 34(1), 111–142. https://doi.org/10.1080/0952813X.2020.1836034

Hierarchy of alignment
The world model of a (self-modelling) Active Inference system could be
informally (because levels are still interdependent) separated in three
levels, roughly corresponding to self-modelling, world modelling, and
world state recognition:
1. Methodological (meta-)models: mathematics, philosophy of science,
meta-ethics, epistemology, rationality, semantics, communication, etc.
2. Science: laws of physics, chemistry, biology, intelligence, economics
3. Facts: the world state in terms of the models from 1. and 2.
Methodological alignment > scientific alignment > fact alignment1
Goals are theory-of-mind-based objects that we should fact-learn about
each other to coordinate them in the context of a cooperative system
“game”.

LLMs are a dead end?
In LLMs, world models q(η) are hopelessly entangled with recognition
(perception, encoder) and planning (actor, in LeCun’s terms) “computations”.
Using human feedback as a signal even during LLM pre-training1
doesn’t
explicitly transfer them ontologies that they should learn. (However, the
language feedback approach2
could be shaped into something that we want.)
Aligning with (and even productively communicating with) a system whose
world model is vastly larger and more complex is possible in principle, but
harder (cf. “humans don’t trade with ants”).
LeCun: LLMs are doomed3
(for related but separate reasons).
1. Korbak, T., Shi, K., Chen, A., Bhalerao, R., Buckley, C. L., Phang, J., Bowman, S. R., & Perez, E. (2023).
Pretraining Language Models with Human Preferences (arXiv:2302.08582). arXiv.
https://doi.org/10.48550/arXiv.2302.08582
2. Scheurer, J., Korbak, T., & Perez, E. (2023). Imitation Learning from Language Feedback.
https://www.lesswrong.com/posts/mCZSXdZoNoWn5SkvE/imitation-learning-from-language-feedback-1
3. LeCun, Y. (2023, April 6). Do large language models need sensory grounding for meaning and understanding?
Yes! https://www.youtube.com/watch?v=x10964w00zk&t=1m30s

Active Inference is an essential, but not an exhaustive
perspective for ensuring AI alignment
Active Inference doesn’t capture the full complexity of behaviour of
intelligent systems.
Other general1,2
and AI architecture-specific perspectives on alignment
should be taken simultaneously.
Constructor-theoretic perspective on alignment (non-Bayesian
probability)?
1. Boyd, A. B., Crutchfield, J. P., & Gu, M. (2022). Thermodynamic machine learning through maximum work production.
New Journal of Physics, 24(8), 083040. https://doi.org/10.1088/1367-2630/ac4309
2. Vanchurin, V. (2020). The World as a Neural Network. Entropy, 22(11), 1210. https://doi.org/10.3390/e22111210

AI alignment is essential, but not sufficient for the AGI
transition to go well
Control theory and system “zombie-fication”1
perspective (aligned
zombies)
Game-theoretic and collective intelligence perspective (actors cannot
align from a multi-polar trap). Collective activity should produce aligned
supra-systems.
● The Collective Intelligence Project, https://cip.org/
Infosec2
and general system fragility3
perspectives: AI, bio weapons of
mass destruction
● Need next-gen infra: https://trustoverip.org/, data ownership a-la
https://solidproject.org/, proof-of-humanness a-la
https://worldcoin.org/, etc.
1. Doyle, J. (2021). Universal Laws and Architectures and Their Fragilities.
https://www.youtube.com/watch?v=Bf4hPlwU4ys
2. Ladish, J., & Heim, L. (2022). Information security considerations for AI and the long term future.
https://forum.effectivealtruism.org/posts/WqQDCCLWbYfFRwubf/information-security-considerations-for-ai-and-the
-long-term
3. Bostrom, N. (2019). The Vulnerable World Hypothesis. Global Policy, 10(4), 455–476.
https://doi.org/10.1111/1758-5899.12718

AI alignment from the Active Inference perspective 2023.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI alignment from the Active Inference perspective 2023.pdf

Similar to AI alignment from the Active Inference perspective 2023.pdf (20)

Recently uploaded

Recently uploaded (20)

AI alignment from the Active Inference perspective 2023.pdf