AI alignment from the perspective of Active Inference. The stack of "alignments": methodological is more important than scientific alignment which is more important than factual alignment, of which goal alignment is a specific type. However, there are also other, non-Bayesian perspectives on alignment that are also important to take, and alignment in and of itself is not enough to ensure that the AI transition goes well for humanity.
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
園田翔氏の博士論文を解説しました。
Integral Representation Theory of Deep Neural Networks
深層学習を数学的に定式化して解釈します。
3行でいうと、
ーニューラルネットワーク—(連続化)→双対リッジレット変換
ー双対リッジレット変換=輸送写像
ー輸送写像でNeural Networkを定式化し、解釈する。
目次
ー深層ニューラルネットワークの数学的定式化
ーリッジレット変換について
ー輸送写像について
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
園田翔氏の博士論文を解説しました。
Integral Representation Theory of Deep Neural Networks
深層学習を数学的に定式化して解釈します。
3行でいうと、
ーニューラルネットワーク—(連続化)→双対リッジレット変換
ー双対リッジレット変換=輸送写像
ー輸送写像でNeural Networkを定式化し、解釈する。
目次
ー深層ニューラルネットワークの数学的定式化
ーリッジレット変換について
ー輸送写像について
Transfer learning aims to improve learning in a target domain by leveraging knowledge from a related source domain. It is useful when the target domain has limited labeled data. There are several approaches, including instance-based approaches that reweight or resample source instances, and feature-based approaches that learn a transformation to align features across domains. Spectral feature alignment is one technique that builds a graph of correlations between pivot features shared across domains and domain-specific features, then applies spectral clustering to derive new shared features.
This document discusses some of the challenges in developing AI systems that utilize machine learning. It notes that machine learning systems rely on probabilities and statistics based on training data, making quality assurance difficult. It is also difficult to fully understand and interpret models from deep neural networks. The document suggests that new approaches are needed for developing machine learning-based systems, as traditional software engineering approaches do not work well. Establishing the field of "machine learning engineering" is important for building AI systems that can reliably ensure quality.
Julia: A modern language for software 2.0Viral Shah
The document discusses the Julia programming language and its uses for software development and machine learning. Some key points:
- Julia aims to solve the "two language problem" by allowing both algorithm development and production deployment on a single platform.
- Julia has seen rapid adoption in recent years, with over 20 million downloads and use by over 10,000 companies and 1,500 universities.
- Julia can perform well for tasks like data science, machine learning, scientific computing and has packages for domains like computer vision, robotics and more.
- Julia's just-in-time compilation allows it to often outperform Python for numeric tasks, while its ease of use aims to be between Python and C++.
The document discusses particle filter tracking in Python. Particle filters use a distribution of samples, or "particles", to approximate the posterior distribution of the state. The particle filter algorithm involves predicting the movement of particles, updating weights based on observation and likelihood, and resampling particles. Example Python code is provided to implement a particle filter for tracking an object in video frames using OpenCV.
More ways of symbol grounding for knowledge graphs?Paul Groth
This document discusses various ways to ground the symbols used in knowledge graphs. It describes the traditional "symbol grounding problem" where symbols are defined based only on other symbols. It then outlines several approaches to grounding symbols in non-symbolic ways, such as by linking them to perceptual modalities like images, audio, and simulation. It also discusses grounding symbols via embeddings, relationships to physical entities, and operational semantics. The document argues that richer grounding could help integrate these notions and enhance interoperability, exchange, identity, and reasoning over knowledge graphs.
This document summarizes a presentation given at the International Workshop on Complex Systems Dynamics in August 2021. The presentation discusses modeling sustainability in social networks using a "being hermeneutics" approach. This approach models reality as complex systems of beings characterized by sustainable states of being and information content. Network models discussed include Erdos-Renyi random graphs, Watts-Strogatz small world networks, Barabasi-Albert scale-free networks, and a topology breeding approach. The presentation also discusses modeling the "sense of self" that drives agent behaviors, including aspects like rational empathy, risk aversion, elastic identity, and regional vs. global identities.
Transfer learning aims to improve learning in a target domain by leveraging knowledge from a related source domain. It is useful when the target domain has limited labeled data. There are several approaches, including instance-based approaches that reweight or resample source instances, and feature-based approaches that learn a transformation to align features across domains. Spectral feature alignment is one technique that builds a graph of correlations between pivot features shared across domains and domain-specific features, then applies spectral clustering to derive new shared features.
This document discusses some of the challenges in developing AI systems that utilize machine learning. It notes that machine learning systems rely on probabilities and statistics based on training data, making quality assurance difficult. It is also difficult to fully understand and interpret models from deep neural networks. The document suggests that new approaches are needed for developing machine learning-based systems, as traditional software engineering approaches do not work well. Establishing the field of "machine learning engineering" is important for building AI systems that can reliably ensure quality.
Julia: A modern language for software 2.0Viral Shah
The document discusses the Julia programming language and its uses for software development and machine learning. Some key points:
- Julia aims to solve the "two language problem" by allowing both algorithm development and production deployment on a single platform.
- Julia has seen rapid adoption in recent years, with over 20 million downloads and use by over 10,000 companies and 1,500 universities.
- Julia can perform well for tasks like data science, machine learning, scientific computing and has packages for domains like computer vision, robotics and more.
- Julia's just-in-time compilation allows it to often outperform Python for numeric tasks, while its ease of use aims to be between Python and C++.
The document discusses particle filter tracking in Python. Particle filters use a distribution of samples, or "particles", to approximate the posterior distribution of the state. The particle filter algorithm involves predicting the movement of particles, updating weights based on observation and likelihood, and resampling particles. Example Python code is provided to implement a particle filter for tracking an object in video frames using OpenCV.
More ways of symbol grounding for knowledge graphs?Paul Groth
This document discusses various ways to ground the symbols used in knowledge graphs. It describes the traditional "symbol grounding problem" where symbols are defined based only on other symbols. It then outlines several approaches to grounding symbols in non-symbolic ways, such as by linking them to perceptual modalities like images, audio, and simulation. It also discusses grounding symbols via embeddings, relationships to physical entities, and operational semantics. The document argues that richer grounding could help integrate these notions and enhance interoperability, exchange, identity, and reasoning over knowledge graphs.
This document summarizes a presentation given at the International Workshop on Complex Systems Dynamics in August 2021. The presentation discusses modeling sustainability in social networks using a "being hermeneutics" approach. This approach models reality as complex systems of beings characterized by sustainable states of being and information content. Network models discussed include Erdos-Renyi random graphs, Watts-Strogatz small world networks, Barabasi-Albert scale-free networks, and a topology breeding approach. The presentation also discusses modeling the "sense of self" that drives agent behaviors, including aspects like rational empathy, risk aversion, elastic identity, and regional vs. global identities.
This document summarizes topic models, which are probabilistic models used to uncover the underlying semantic structure of document collections. It introduces latent Dirichlet allocation (LDA), the simplest topic model, which models documents as mixtures of topics, where each topic is a distribution over words. LDA assumes documents exhibit multiple topics in different proportions. It describes the graphical model representation of LDA and the probabilistic generative process that is assumed to have produced the observed document-word data. Figures from applying LDA to a collection of Science articles are shown to illustrate the automatically discovered topics.
We know that we are in an AI take-off, what is new is that we are in a math take-off. A math take-off is using math as a formal language, beyond the human-facing math-as-math use case, for AI to interface with the computational infrastructure. The message of generative AI and LLMs (large language models like GPT) is not that they speak natural language to humans, but that they speak formal languages (programmatic code, mathematics, physics) to the computational infrastructure, implying the ability to create a much larger problem-solving apparatus for humanity-benefitting applications in biology, energy, and space science, however not without risk.
6. kr paper journal nov 11, 2017 (edit a)IAESIJEECS
Knowledge Representation (KR) is a fascinating field across several areas of cognitive science and computer science. It is very hard to identify the requirement of a combination of many techniques and inference mechanism to achieve the accuracy for the problem domain. This research attempted to examine those techniques, and to apply them to implement a Cognitive Hybrid Sentence Modeling and Analyzer. The purpose of developing this system is to facilitate people who face the problem of using English language in daily life.
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
Uncertainty is a pervasive in real world environment due to vagueness, is associated with the
difficulty of making sharp distinctions and ambiguity, is associated with situations in which the
choices among several precise alternatives cannot be perfectly resolved. Analysis of large
collections of uncertain data is a primary task in the real world applications, because data is
incomplete, inaccurate and inefficient. Representation of uncertain data in various forms such
as Data Stream models, Linkage models, Graphical models and so on, which is the most simple,
natural way to process and produce the optimized results through Query processing. In this
paper, we propose the Uncertain Data model can be represented as Possibilistic data model
and vice versa for the process of uncertain data using various data models such as possibilistic
linkage model, Data streams, Possibilistic Graphs. This paper presents representation and
process of Possiblistic Linkage model through Possible Worlds with the use of product-based
operator.
The document provides an overview of a presentation by John Smart on evolution, development, and the future of networks. It discusses concepts like autopoesis, universal development from outer to inner space, and the "goodness of the universe." The presentation outlines that evolution and development can both be seen in life and the universe, with unpredictable evolutionary processes working with predictable developmental processes to create complexity. It also discusses models of evolutionary development dynamics and examples of evolutionary convergences.
1. DNA contains complex specified information that is organized in a digital code with syntax, semantics, and pragmatics. This information is used to build proteins and control processes in cells.
2. Modern science has not found a naturalistic explanation for how large amounts of specified digital information could arise. Intelligent design provides a better explanation than evolution by natural selection.
3. The intelligent designer behind DNA would need to be extremely intelligent to create such a complex system that exceeds human engineering abilities, and also have a sense of aesthetics to make nature delightful.
Mutual redundancies and triple contingenciesleydesdorff
PowerPoint Presentation at the conference of International Society for Information Studies, Vienna, 3-7 June 2014; Session: Integration of the Philosophy of Information and Information Science
Predictive coding posits that neural systems make forward-looking predictions about incoming information.
Neural signals contain information not about the currently perceived stimulus, but about the difference
between the observed and the predicted stimulus. We propose to extend the predictive coding framework
from high-level sensory processing to the more abstract domain of theory of mind; that is, to inferences about
others’ goals, thoughts, and personalities. We review evidence that, across brain regions, neural responses
to depictions of human behavior, from biological motion to trait descriptions, exhibit a key signature of pre-
dictive coding: reduced activity to predictable stimuli. We discuss how future experiments could distinguish
predictive coding from alternative explanations of this response profile. This framework may provide an
important new window on the neural computations underlying theory of mind.
Functional and Structural Models of Commonsense Reasoning in Cognitive Archit...Antonio Lieto
The document provides an overview of functional and structural models of commonsense reasoning in cognitive architectures. It discusses several approaches to commonsense reasoning including semantic networks, frames, scripts, and default logic. It also discusses different levels of representation including conceptual spaces, typicality, and compositionality. The document proposes dual process models that integrate heterogeneous representations like prototypes and exemplars. It presents computational models like Dual PECCS and TCL that implement aspects of commonsense reasoning through integrated and connected representations.
Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus
Evolutionary computation techniques like genetic programming and evolutionary algorithms can be used for adaptive optimization, data mining, and machine learning. They have been successfully applied to problems like modeling galaxy distributions, material modeling, constraint handling, dynamic optimization, multi-objective optimization, and ensemble learning. While evolutionary computation has had many real-world applications, challenges remain in improving theoretical foundations, scalability to large problems, dealing with dynamic and uncertain environments, and developing the ability to learn from previous optimization experiences.
This document discusses quantifying hierarchy in complex networks. It begins by providing examples of hierarchical structures in the brain, mental lexicon, gene regulation and food webs. It then outlines requirements for measuring hierarchy, such as being based on global topology. It introduces Random Walk Hierarchy as a measure that meets these requirements. Random Walk Hierarchy tracks how information spreads through random walks on the network. The more a network resembles a hierarchy, the more information will converge on root nodes. The measure is applied to different network structures and real-world examples.
This document discusses the concept of a "knowbula", which is defined as a 3D envelope that represents all human activities and knowledge in a given field. It proposes that knowbula can be visualized with objects along the x-axis, verb functions along the y-axis, and time along the z-axis. As human activity takes place, it causes incremental changes to the shape and volume of the knowbula as knowledge is gained. The document also discusses how machines can help track changes to objects over time and potentially extrapolate how changes could make objects more desirable.
Linguistics models for system analysis- Chuluundorj.BKhulan Jugder
This document discusses various models and concepts for analyzing systems, including:
1. It describes different types of systems including mechanistic, animate, social, and ecological systems.
2. It discusses open, closed, and semi-closed systems and how they interact with their environments.
3. Several mathematical and scientific concepts are proposed as models for linguistic and semantic analysis, such as group theory, topology, Hilbert spaces, and quantum mechanics.
4. The document suggests that these concepts from mathematics, physics, and other fields can provide frameworks for understanding semantic structures, mental representations, and cognitive processes.
A General Principle of Learning and its Application for Reconciling Einstein’...Jeffrey Huang
This document proposes a general principle of learning based on discovering intrinsic constraint models. It defines intelligence as the ability to understand the world by discovering intrinsic variables and constraints that have minimum entropy. The key goals of learning are to map observations to intrinsic variables and detect constraints to minimize a model entropy objective function. Discovering intrinsic variables is critical for maximizing prediction accuracy, learning efficiency, and generalization power. The principle provides a theoretical foundation for explaining and developing artificial general intelligence.
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...IJEAB
This document presents a theoretical framework for measuring social complexity based on the relationship between complexity, entropy, and evolutionary dynamics. It uses entropy principles to study the emergence of cooperation in social systems. The collapse of the Rapa Nui civilization is used as a case study. The framework defines social complexity using information-theoretic entropy measures and relates higher entropy production and cooperation to greater sustainability. Cooperation emerges when average fitness exceeds local fitness.
What Is Complexity Science? A View from Different Directions.pdfKizito Lubano
This document discusses different perspectives on complexity science and what constitutes complexity science. It identifies three main schools of thought: reductionistic complexity science which aims to uncover universal principles through mathematics; soft complexity science which views complexity concepts as useful metaphors rather than directly applicable theories; and complexity thinking which focuses on the limits of knowledge given complexity. There is no agreed upon definition of complexity or what constitutes a science of complexity due to the diverse views. The document aims to stimulate debate around complexity science and its scientific status.
Gregory vigneaux design thinking for the end of the worldGregory Vigneaux
This presentation brings together storytelling, design thinking, and complexity as it discusses approaching the difficult challenges facing Colorado’s emergency management community. Focused on problem framing, storytelling is explored as a key step in engaging with complex issues while the audience is invited to think about the stories they are currently telling about problems and consider how they might begin to craft different ones.
Similar to AI alignment from the Active Inference perspective 2023.pdf (20)
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
AI alignment from the Active Inference perspective 2023.pdf
1. AI alignment from the perspective of
Active Inference
Roman Leventov
г.Москва, 22-23 апреля 2023 г.
Научно-практическая конференция
"Современная системная инженерия и менеджмент"
2. Free Energy Principle: physical modelling basics
The FEP formalism assumes that the world is modelled as a set of
variables x that comprise a random dynamical system1
, in discrete or
continuous time:
x'(t) = f(x, t) + w(t),
Where x' is the rate of change of variables’ states, f is state-dependent
function (flow), and w is noise.
1. Friston, K., Da Costa, L., Sakthivadivel, D. A. R., Heins, C., Pavliotis, G. A., Ramstead, M., & Parr, T. (2022).
Path integrals, particular kinds, and strange things (arXiv:2210.12761). arXiv. http://arxiv.org/abs/2210.12761
3. Free Energy Principle basics: sparse coupling conjecture
A system is (approximately) causally separated from the environment
between t0
and now. μ are internal states, s are sensory states, a are
active states, b = (s, a) are boundary states, η are external states.
Illustration from Friston, K. (2019). A free energy principle for a particular physics (arXiv:1906.10184). arXiv.
http://arxiv.org/abs/1906.10184
4. FEP: path integral formulation (path-tracking dynamics)
Semantics are only associated with physical dynamics rather than static
physical states1
. Semantics = commuting mapping from physical objects to
mathematical objects.
μt
, bt
, ηt
are paths (trajectories) of states, i.e., physical dynamics.
∀ bt
: ∃ p(ηt
| bt
), a conditional density, μt
is the path of least action of internal
states ⇒ ∃q: μt
→ p(ηt
| bt
), semantic mapping from the path of internal
system states to beliefs about external state trajectories (a mathematical
object)2
.
VFE lemma2
: system state dynamics can be seen as a form of Bayesian
inference of q(ηt
), a variational density over external paths, wrt. some prior
and evidence bt
. ⇒ duality of physical and belief (mathematical)
dynamics (“Bayesian mechanics”)3
1. Fields, C., Friston, K., Glazebrook, J. F., & Levin, M. (2022). A free energy principle for generic quantum systems.
Progress in Biophysics and Molecular Biology, 173, 36–59. https://doi.org/10.1016/j.pbiomolbio.2022.05.006
2. Friston, K., Da Costa, L., Sakthivadivel, D. A. R., Heins, C., Pavliotis, G. A., Ramstead, M., & Parr, T. (2022). Path
integrals, particular kinds, and strange things (arXiv:2210.12761). arXiv. http://arxiv.org/abs/2210.12761
3. Ramstead, M. J. D., Sakthivadivel, D. A. R., Heins, C., Koudahl, M., Millidge, B., Da Costa, L., Klein, B., & Friston,
K. J. (2023). On Bayesian Mechanics: A Physics of and by Beliefs (arXiv:2205.11543). arXiv.
https://doi.org/10.48550/arXiv.2205.11543
5. Three important assumptions, or “moves”
Generalisation: q(ηt
) encodes beliefs about the present, not the future, but
we assume that smart systems decompose their beliefs into facts (current
state of the world) + generative model (e.g., scientific laws)
Assuming that systems “use” q(η) to “choose” their next action to minimise
expected free energy (~ integral of future surprise), i.e., perform Active
Inference, is induction (if the system is a black box), unless systems are
explicitly designed1
to do this or proven to explicitly do this.
Meta-theoretical move2
: assuming that scientists (observers) observe
themselves as Active Inference systems “reifies” FEP as the basis of
semantics and rationality (i.e., a form of Bayesian epistemology, Deutsch
disapproves)
1. Friston et al. (2022). Designing Ecosystems of Intelligence from First Principles (arXiv:2212.01354). arXiv.
http://arxiv.org/abs/2212.01354
2. Ramstead, M. J. D., Sakthivadivel, D. A. R., & Friston, K. J. (2022). On the Map-Territory Fallacy Fallacy
(arXiv:2208.06924). arXiv. http://arxiv.org/abs/2208.06924
6. Active Inference: against goals (objectives)
Active Inference system’s behaviour is caused (generated) by its
beliefs q(η) rather than its goals.
“Goals” appear only as future world states on highly predicted trajectories that
the system reflexively notices and records in memory to save computations in
the future. But even if thus recorded, goals remain in principle ephemeral and
discardable at any iteration in the active inference cycle (= OODA cycle).
See also: flaneuring (Taleb), open-endedness (Stanley & Lehman), lean
(Ries), etc., https://ailev.livejournal.com/1254147.html
⇒ Align beliefs instead of “specifying” goals. (Applies to alignment
between any intelligent systems on the same or different system levels, not
just to human–AI alignment. Cf. “managing with context, not control”.)
7. Definition of alignment
Informally: alignment is learning about each other, i.e., increasing mutual
capacity for predicting (signals from) each other.
FEP (with reference frame): Alignment is a physical interaction process (=
information exchange1
) between two systems during which their internal
dynamics entail belief structures (or update their prior beliefs, from from
their own perspectives) which decompose into causal generative
models with smaller transformation error2
(caveat: acyclic graphs only)
and the fact beliefs (current world states) that are closer after causal
model transformation wrt. some distance measure (KL/JS divergence?).
Quantum FEP (w/o RF): quantum RF alignment across holographic
screen1
= entanglement.
1. Fields, C., Friston, K., Glazebrook, J. F., & Levin, M. (2022). A free energy principle for generic quantum
systems. Progress in Biophysics and Molecular Biology, 173, 36–59.
https://doi.org/10.1016/j.pbiomolbio.2022.05.006
2. Rischel, E. F., & Weichwald, S. (2021). Compositional abstraction error and a category of causal models.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 1013–1023.
https://proceedings.mlr.press/v161/rischel21a.html
8. Learning and aligning full world models is intractable
While AI architecture could be chosen to explicitly include a world
model1,2,3
, the architecture of human intelligence couldn’t be chosen!
Discovering large causal graphs is extremely expensive: the search space
size grows as 2d*d
, where d is the number of variables4
.
Humans (and other universal intelligences) learn many “local”, incoherent
models, which they select contextually5
. Monolithic q(η) doesn’t exist.
Solution: design belief sharing (communication) protocols1
and learning
environments that foster world model alignment without explicitly tracking
them.
1. Friston et al. (2022). Designing Ecosystems of Intelligence from First Principles (arXiv:2212.01354). arXiv.
http://arxiv.org/abs/2212.01354
2. LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence.
3. Zhou, G., Yao, L., Xu, X., Wang, C., Zhu, L., & Zhang, K. (2023). On the Opportunity of Causal Deep Generative
Models: A Survey and Future Directions (arXiv:2301.12351). arXiv. https://doi.org/10.48550/arXiv.2301.12351
4. Atanackovic, L., Tong, A., Hartford, J., Lee, L. J., Wang, B., & Bengio, Y. (2023). DynGFN: Bayesian Dynamic Causal
Discovery using Generative Flow Networks (arXiv:2302.04178). arXiv. https://doi.org/10.48550/arXiv.2302.04178
5. Fields, C., & Glazebrook, J. F. (2022). Information flow in context-dependent hierarchical Bayesian inference. Journal
of Experimental & Theoretical Artificial Intelligence, 34(1), 111–142. https://doi.org/10.1080/0952813X.2020.1836034
9. Hierarchy of alignment
The world model of a (self-modelling) Active Inference system could be
informally (because levels are still interdependent) separated in three
levels, roughly corresponding to self-modelling, world modelling, and
world state recognition:
1. Methodological (meta-)models: mathematics, philosophy of science,
meta-ethics, epistemology, rationality, semantics, communication, etc.
2. Science: laws of physics, chemistry, biology, intelligence, economics
3. Facts: the world state in terms of the models from 1. and 2.
Methodological alignment > scientific alignment > fact alignment1
Goals are theory-of-mind-based objects that we should fact-learn about
each other to coordinate them in the context of a cooperative system
“game”.
10. LLMs are a dead end?
In LLMs, world models q(η) are hopelessly entangled with recognition
(perception, encoder) and planning (actor, in LeCun’s terms) “computations”.
Using human feedback as a signal even during LLM pre-training1
doesn’t
explicitly transfer them ontologies that they should learn. (However, the
language feedback approach2
could be shaped into something that we want.)
Aligning with (and even productively communicating with) a system whose
world model is vastly larger and more complex is possible in principle, but
harder (cf. “humans don’t trade with ants”).
LeCun: LLMs are doomed3
(for related but separate reasons).
1. Korbak, T., Shi, K., Chen, A., Bhalerao, R., Buckley, C. L., Phang, J., Bowman, S. R., & Perez, E. (2023).
Pretraining Language Models with Human Preferences (arXiv:2302.08582). arXiv.
https://doi.org/10.48550/arXiv.2302.08582
2. Scheurer, J., Korbak, T., & Perez, E. (2023). Imitation Learning from Language Feedback.
https://www.lesswrong.com/posts/mCZSXdZoNoWn5SkvE/imitation-learning-from-language-feedback-1
3. LeCun, Y. (2023, April 6). Do large language models need sensory grounding for meaning and understanding?
Yes! https://www.youtube.com/watch?v=x10964w00zk&t=1m30s
11. Active Inference is an essential, but not an exhaustive
perspective for ensuring AI alignment
Active Inference doesn’t capture the full complexity of behaviour of
intelligent systems.
Other general1,2
and AI architecture-specific perspectives on alignment
should be taken simultaneously.
Constructor-theoretic perspective on alignment (non-Bayesian
probability)?
1. Boyd, A. B., Crutchfield, J. P., & Gu, M. (2022). Thermodynamic machine learning through maximum work production.
New Journal of Physics, 24(8), 083040. https://doi.org/10.1088/1367-2630/ac4309
2. Vanchurin, V. (2020). The World as a Neural Network. Entropy, 22(11), 1210. https://doi.org/10.3390/e22111210
12. AI alignment is essential, but not sufficient for the AGI
transition to go well
Control theory and system “zombie-fication”1
perspective (aligned
zombies)
Game-theoretic and collective intelligence perspective (actors cannot
align from a multi-polar trap). Collective activity should produce aligned
supra-systems.
● The Collective Intelligence Project, https://cip.org/
Infosec2
and general system fragility3
perspectives: AI, bio weapons of
mass destruction
● Need next-gen infra: https://trustoverip.org/, data ownership a-la
https://solidproject.org/, proof-of-humanness a-la
https://worldcoin.org/, etc.
1. Doyle, J. (2021). Universal Laws and Architectures and Their Fragilities.
https://www.youtube.com/watch?v=Bf4hPlwU4ys
2. Ladish, J., & Heim, L. (2022). Information security considerations for AI and the long term future.
https://forum.effectivealtruism.org/posts/WqQDCCLWbYfFRwubf/information-security-considerations-for-ai-and-the
-long-term
3. Bostrom, N. (2019). The Vulnerable World Hypothesis. Global Policy, 10(4), 455–476.
https://doi.org/10.1111/1758-5899.12718