1. Computation and Design of
Autonomous Intelligent
Systems
Robert L. Fry
Presentation to the SPIE Defense and Security Conference
Orlando, FL
March 17, 2008
This work was supported through AFOSR contract FA9550-06-1-0297 under Dr. Robert Bonneau
2. Outline
• Computational Theory of Autonomous
Intelligent Systems
• Engineering intelligent Systems
• Neural Computation and other Sample
Applications
• Discussion and Next Steps
4. Basic Idea
To engineer an intelligent system, one must
have a working definition for what intelligence
is. The following is suggested:
“An intelligent system acquires information and
uses it to make decisions in service to its
computational goal”
If we can computationally quantify each of the
above highlighted terms, then we might have a
formal basis for engineering intelligent systems.
5. Definition Flowdown
To acquire information, a system must
pose a question to its environment
To make decisions, a system must
answer a question of what to do.
6. What is a Question?
What is a Question?
If we can come up with a working
definition of questions, then we are in
business.
7. “What Is a Question?”
“A question is defined by the set of
all subjectively possible answers”
Richard Cox, Physicist
Johns Hopkins University
(1898-1991)
Dr. Richard Cox1 of the Johns Hopkins
University Physics Department
developed a joint algebra of questions
and assertions
1 “Of Inference and Inquiry,” Proc. First Maximum Entropy Workshop, MIT, 1978
His extension of logic mathematically captures how
computation is performed within the subjective
frame of a system.
8. Boolean Algebra of Questions
and Assertions
~~a = a
a a = a a a = a
a b = b a a b = b a
~(a b) = ~a ~b ~(a b) = ~a ~b
a b c = a (b c)
= (a b) c
a b c = a (b c)
= (a b) c
(a b) c = (a c) (b
c)
(a b) c = (a c) (b
c)
(a b) b = b (a b) b = b
(a ~a) b = b (a ~a) b = b
a ~a b = a ~a a ~a b = a ~a
Algebra AssertionsAlgebra Assertions
~~A = A
A A = A A A = A
A B = B A A B = B A
~(A B) = ~A ~B ~(A B) = ~A ~B
A B C = A (B C)
= (A B) C
A B C = A (B C)
= (A B) C
(A B) C = (A C)
(B C)
(A B) C = (A C) (B
C)
(A B) B = B (A B) B = B
(A ~A) B = B (A ~A) B = B
A ~A B = A ~A A ~A B = A ~A
Algebra of QuestionsAlgebra of Questions
Logical Basis of
Probability Theory
Logical Basis of
Information/Control
Theories
• Logical questions quantify the computational
rules of intelligence and autonomy
Dual Boolean Algebras
Conventional
Logic
Logic of
Questions
Let upper case italicized letters like A denote questions,
e.g., A “Is it an apple or not?” {a, ~a}.
9. Probability and Entropy
Property Probability Theory Bearing Theory
(Entropy)
Conjunctive Rule p(ab|c)=p(a|c)+p(b|c)p(ab|c) b(AB|C)=b(A|C)+b(B|C)b(AB|C)
Normalization p(a|c)+p(~a|c) = 1 b(A|C)+b(~A|C) = 1
Marginalization Rule p(ab|c)+p(a~b|c)=p(a|c) b(AB|C)+b(A~B|C)=b(A|C)
Bayes’ Theorem p(ab|c)=p(b|c)p(a|cb) b(AB|C)=b(B|C)b(A|CB)
• Probability and Entropy (called Bearing) are derivable from logic and
consistency
• Respectively they are the unique measures of subjective knowledge
and uncertainty as computed within a local system frame
Sample Identities in probability and information theory
10. A Simple “Intelligent” System
Decision Space Answer
Y: Cilia turned On or Off
Cilia
A protozoan-like
system asks X and
answers Y
Creatures
Optical Field-of-
View
Currents randomly perturb
creatures orientation
Information Space Ask X:
Detector is On or Off
Light Source
and Possible
Food
Linear
Motion
Detector
There are two kinds of questions –
those that can be asked and those
that can be answered by a system.
11. Logical Questions: Card-Guessing
Game Example
S{ ,, , } C{ },
S“What Suit is the Card?” C“What Color is the Card?”
For systems acquiring information X and making decisions
Y, and so X Y is the actionable information of the system.
Define the Questions S and C:
Disjunction (Logical OR) provides the information asked by either
S C { ,, } , , , { }, C
Conjunction (Logical AND) provides the information asked by both
SC { ■ , ■, ■ , ■, ■ , ■, ■ , ■}
{ ,, , } S
12. Decision Space Y:
Cilia turned On or
Off
Cilia
Information Space X:
Detector is On or Off
Detector
Intelligence and Autonomy
x=0 y=0
x=1 y=0
Never Do
Anything
x=0 y=1
x=1 y=1
Always Do
Something
x=0 y=1
x=1 y=0
Always Do Wrong
Thing
Possible Behaviors
Four computational mappings
comprise possible system
behaviors. The last matches
decisions to available
information and so XY =X =Y
x=0 y=0
x=1 y=1
Always Do Right
Thing
XY=X
14. Intelligence and Computation
Thermodynamics
Intelligence
Information
Theory
Computation
and
Questions
Theory exploits and extends
concepts and tools from
thermodynamics and
information theory and in
turn enriches them.
• Entropy
• Source and channel coding
• Shannon’s dual problems
• *Dual-matching concept
• Entropy
• Maximum entropy principle
• Shannon’s dual problems
• Carnot cycle
• Dual-matching
• *Carnot cycle
• Maximum entropy principle
• Entropy
16. Information Theory and
Intelligence
Source X Receiver Y
Answer question of what
to transmit
Question answered as to
what is received
Information Theory
Decisions YAcquire X
Question posed to
environment answered
Question answered as
to what to do
Intelligence Theory
• Information theory and intelligence theory are functional duals
• Latter describes how to make decisions with uncertainty
• Methods and constructs in one domain apply to the other
17. Dual-Matching Concept
• Just as the Carnot cycle from Thermo applies to
intelligent systems, so does the recent concept2
of Dual-Matching from information theory
• Dual-matching provides the quantitative basis
for the design and operation of efficient
intelligent systems (their Carnot Cycles)
• Dual-matching requires simultaneously solving
Shannon’s dual problems:
– Minimize information required
– Maximize what can be done with acquired information
2 M. Gastpar, To Code or Not to Code, Ph. D dissertation, Thèse EPFL, no 2687, Ecole
Polytechnique Fédérale de Lausanne, 2002. This concept is revolutionizing information theory.
18. Computation and Carnot Cycles
4. Erase
Information
2. Store
Information
3. Make Decision
1. Acquire Information1. Make Decision
3.0 Acquire Information
4. Store
Information
2. Erase
Information
• Carnot engine
• Internal combustion engine
• Communication systems
Area = Useful Work
Produced
Area = Energy
Required to Operate
• Carnot refrigerator
• Computer contol
• Intelligent systems
• Logic dictates that there are two kinds of computation a
system can perform
• These correspond to the two Carnot Cycles of
thermodynamics
20. Neural Computation
Using the described theory and methods, one
can perform a top-down design of pyramidal
neurons as found in brains. This is a simple,
elegant, and informative example.
21. What Do Neurons Ask and
Answer?
Axon
Comprises
Single
Output
Y={0,1}
X1
X2
X10000
104 Synaptic
Inputs Xi={0,1}
Soma Integrates
Inputs and Makes
Decisions
• There are 210000 possible
answers (microstates)!
• The neuron poses the
question X=X1X2…X10000
• Neuron simply answers Y
Assume matching of
actionable information
to decisions is its goal:
XY = Y.
22. Principal of Maximum Entropy
• The Principal of Maximum Entropy basic to
thermodynamics dictates neural probability distribution
Resulting Max-Ent Distribution
Partition (normalization)
Function
0
p( , | )ln p( , | )
p( , | )[( , ) , ]
p( , | )[ ]
p( , | ) 1
y Y X
T
y Y X
y Y X
y Y X
J y a y a
y a y y
y a y y
y a
x
x
x
x
x x
x x x
x
x
exp ( , )
p( , )
T
y y
y
Z
x
x
exp ( , )T
Z y y x y
x
• Lagrange multiplier is coupling strengths and scalar
is somatic decision threshold
23. Dual-Matching Adaptation
Hebbian Gating
for (1) – (3)
.
.
.
1
1
n
Y
X1
X2
Xn
.
.
.
1
n
X
Y
Three Hebbian Learning Equilibria Result that Can be
Realized Using Simple Biologically Plausible Algorithms
1) Threshold Adaptation ()
The Optimal Decision Threshold Is
Average Somatic Potential
( ) (1 ) ( )t t t
T
x
3) Delay Equalization ()
Elements of the Optimal Time Delay
Vector Must Satisfy “Momentum”
Equalization:
di / dt = i y(t) dxi(ti)/dt = 0
2) Gain Adaptation ()
The Optimal Gain Vector Is the Largest
Eigenvector of the Input Covariance Matrix R
( ) ( ) ( )[ ( ) ( ) ( )]t t t t tx x x
| 1 | 1
T
E y y x x x xR( )
24. Neural Carnot Cycle
4. Erase Information
during the refractory
period
3. Decision Made
by soma
0.9 Bit/Decision
T=1/
1. Acquire Information
through synapses
2. Information
Stored in
soma
T=1
T=0.2
H(X)=b(X|A)
Z=2n
I(X;Y)=b(XY|A)
Z=2n+1
Z=2n Z: 2n2n+1
A single neuron operates as a Carnot refrigerator as do
all intelligent systems. It has ~90% Carnot Efficiency.
Temperature
Entropy
28. Simulation Output
Synaptic Gains
1 5 10 15 20
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Gains on non-informative
inputs are driven to zero.
Training vector bit
Vectors Inducing
Firing
1 5 10 15 20
0
1
The neuron learns to fire
on almost exactly half of
the training vectors.
Training vector index
29. 7/18/2002
A Geometric View of Neural
Computation
Input Information Space
contains >210000 codes
Neuron defines a
hyperplane
decision surface
Tx– = 0
Hyperplane Separates
Input Space into Two
Equally Probable
Regions: H(Y) = 1 bit
Fire!
(y = 1)
Do Not Fire!
(y = 0)
Theory provides a detailed
explanation of how
pyramidal neurons compute
31. Weapon Systems and Dual-
Matching
• Efficient system operation requires
continual matching of weapon system
information and control spaces
• Fire Control Loop operates as an engine
governor
– Minimize fuel consumption under
varying loading conditions
• All weapon systems acquire information
and make decisions with uncertainty
• The fire control loop of a missile system is
a good example:
Dual-Matching Process
X = Where can the target be? (Informational uncertainty)
Y= Where can the weapon go? (System control capacity)
X
Y
32. Weapon System Applications
• Algorithms Proposed or Under Development*
– *Object correlation
– Sensor management
– *Discrimination and track fusion
– *Weapon-Target assignment
– *Guidance Laws1
• Weapon System Architectures
– Distributed fire control for swarming intelligence
– C2BMC
– Networked weapon systems have unique Boolean
expressions
1 Example of the derivation of a guidance law with uncertainties in tracking, discrimination,
and guidance is given in the backup slides and a paper is available on request.
33. Discussion and Next Steps
• Continue to model for biological computation
emphasizing the development of Cortical Systems
• Begin wider formulations of weapon systems
problems with focus on ballistic missile defense
– Decisioning uncertainty is hallmark of the BMD problem
• Military and data networks have simple
formulations
– Networks have natural functional decompositions lending
themselves to global optimization
34. Main issues in going
forward?
• Documenting work done so far - perhaps in
book form
• Bridging the multidisciplinary boundaries of
thermodynamics, information theory, and
intelligence
– Only a modest understanding of each area
seems sufficient
35. Acknowledgements
• This work was supported by AFOSR Project
IONS - Information-Theoretic Optimization of
Networked Systems
• Project IONS is 3-Year joint SEG/Princeton
project to develop an engineering framework
for distributed intelligent systems
• My co-PI is Dr. Mung Chiang of Princeton
who has developed a highly efficient
information optimization methods based on
Geometric Programmingg
36. Selected References
[1] “Double Matching: The Problem that Neurons Solve,” Computational Neuroscience Meeting, Neurocomputing, 69,
pp. 1086–1090, 2005.
[2] “Neural Statics and Dynamics,” Computational Neuroscience Meeting, Neurocomputing, 65, pp. 455-462, 2005.
[3] “Logical and Geometric Inquiry,” Proc. Workshop on Maximum Entropy and Bayesian Methods, 659, pp. 243-280,
2003, American Institute of Physics.
[4] “A Theory of Neural Computation,” Computational Neuroscience Meeting Neurocomputing, 52, pp. 255-263, 2002.
[5] “Neural processing of information,” Proc. International Symposium on Information Theory, Trondheim, Norway, pp.
217, 1994.
[6] “Cybernetic Defense Systems,” Proc. of the MD SEA Conference held in Monterey, CA, February 2001.
[7] et. al A Fokker-Planck Model For A Two-Body Problem, Proc. Maximum Entropy and Bayesian Methods, 617, pp.
340-371, 2002.
[8] “Cybernetic systems based on inductive logic,” 2000 Maximum Entropy and Bayesian Methods Conference, Gif
sur Yvette, France
[9] “Constructive methods for BMD algorithm design and adaptation,” Phase III of the Battlespace Study Final Report,
JHU/APL, 2000.
[10] “Multi-sensor fusion using information geometry,” presented at the 1999 Maximum Entropy and Bayesian
Methods Conference, Boise, Idaho.
[11] “Transmission and transduction of information,” Presented at the Workshop on Maximum Entropy and Bayesian
Methods, Garching, Germany, 1998.
[12] “An analytical basis for TBMD system design,” Proc. 1998 National Fire Control Conference, San Diego, CA.
[13] “Observer-participant models of neural processing,” IEEE Trans. Neural Networks, 6, pp. 918-928, July, 1995.
38. A Neuron is an Intelligent System
• Synapses X and axon Y are elementary questions
• Laboratory confirmation of predicted energy allocations
• Significant numbers of biological results and predictions
• Partition function and critical temperature determined
Axon Y
(Output)
Soma
(Decides)
H(Y)
I(X;Y)
~104 Dendritic Inputs
and 1 Output
Theory provides a detailed
explanation of how
pyramidal neurons compute
, , 1
0( ) ( , )log ( , ) { } { }
( | )
( )
n
i i
X Y X Y i
J p y p y E x y E y
p y
p y
λ x x
x
39. Midcourse BMD Example
• Perform a distributed computation to achieve global optimization
• Dynamically match actionable information X and decision space Y
Distributed C2BMC
Dual-Matching
Algorithms
Sensor measurements eliminate possible
target locations
Question X “Where is the target?”
Sensing
Elements
Radars
Seekers
Guidance decisions eliminate possible places
missile can go
Question Y “Where should I go?”
Kinematic
Elements
Boosters
KVs
40. BMD System Architectures
• Weapon architectures can be quantified
• Basis for comparing architectures and making trades
• Decomposition allows representation of entire network
• Formulation applies to any kind of weapon or warfare:
I. 1-on-1 I(X;Y) Single KV vs. Single RV (EKV)
II. M-on-1 I(X;Y1,Y2, …,Y) Multiple KVs vs. Single RV (MKV)
III. 1-on-N I(X1,X2,…,XN;Y) Single KV vs. MIRVed Threat
IV. M-on-N I(X1,X2,…,XN;Y1,Y2, …,YM) Multiple KVs vs. MIRVed Threat
Architectural
Class
Basic Objective
Function
BMD Architecture
Bullets = KVs = Infantry = Missiles
41. Guidance with Targeting and
Guidance Uncertainties
Will now determine the exact analytical solution to a simple 1-
dimensional problem to demonstrate some of the described
concepts.
Tracks will never be perfect, but this does
not change how the problem is solved.
X
x0
x1
Target Localization
Space
Target localization with perfect tracks at x0 and x1
and resp. discrimination probabilities p0 and p1.
respectively (p0+p1=1):
p(x) = p0(x-x0) + p1(x-x1)
This is where the target can be
42. MKV Architectures
Y1 Y2 Ym
X1 X2 Xm
. . .
a) Federated architecture
Y1 Y2 Ym
X1 X2 Xm
. . .
b) Swarming architecture
(Only nearest-neighbor comm)
Z2
Z1
Z3
Z2
Zm
Zm-1
Y1 Y2 Ym
X1 X2
Xm
. . .
c) Centralized control architecture
Zm
Z2Z1
XC
I1(X;Y)
I3(X;Y)
I2(X;Y)
Potential to trade system architecture vs.
weapon system capacity vs. cost
Weapon System
Capacity
Cost ($)
A1
A3
A2
A1 A2 A3
43. Guidance with Targeting and
Guidance Uncertainties
Can determine optimal guidance solutions in the
presence of tracking, discrimination, and guidance
uncertainties
X
x0
x1
Target localization with perfect tracks at x0 and x1
and resp. discrimination probabilities p0 and p1.
respectively (p0+p1=1):
Tracks will never be perfect, but this does
not change how the problem is solved.
Target Localization
Space
44. Guidance with Targeting and
Guidance Uncertainties
The kinematic space of the system is the reachable space of
the interceptor after it selected some point as its next guide
point – the missile always has to be going somewhere!
Y
x0
x1
Missile Kinematic
Space (After Maneuver)
’
The guidance algorithm objective is to
determine the optimal as its next guide point
given guidance and informational uncertainties
of p(y|x) and p(x).
Varying changes the kinematic space p(y|x)
once the command is executed.
45. Guidance with Targeting and
Guidance Uncertainties
The idea is to minimize the mutual information over subject to
a probability-of-miss constraint (a “miss” is like a transmission
bit error):
This guidance law minimizes statistically the fuel expenditure over
the ensemble of such engagements with these uncertainties.
where
If no constraint, then J0 and the missile guides halfway between
the objects if both will remain in its kinematic space:
Otherwise the missile should navigate to the probability centroid
between the objects:
46. Guidance with Targeting and
Guidance Uncertainties
The kinematic space of the system is the reachable space of
the interceptor after it selected some point as its next guide
point – the missile always has to be going somewhere!
Y
x0
x1
Missile Kinematic
Space (After Maneuver)
’’
’
The guidance algorithm objective is to
determine the optimal as its next guide point
given guidance and informational uncertainties
of p(y|x) and p(x).
Initially assume that missile is guiding half way
between the objects (x0+x1)/2 implying that
discrimination information is not used.
(x0+x1)/2(Probability missile can get to y when target is at x)
47. Guidance with Targeting and
Guidance Uncertainties
The idea is to minimize the mutual information over subject to
a probability-of-miss constraint (a “miss” is like a transmission
bit error):
If no constraint, then J0 and the missile continues guiding
halfway between the objects:
x0 = 0
x1 = 1
p0=0.25
p1=0.75
2=1
Example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.005
0.01
0.015
0.02
0.025
0.03
mu
TransactedControl(bits)
Control Rate vs.
Guide Point
Targeting
information ignored
Guide to most
probable target.
This minimizes
probability of
error but not fuel.
PIP
48. x0=0, x1=1, p0=p1=0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.94
0.945
0.95
0.955
0.96
0.965
0.97
0.975
Guide Point mu
ProbabilityofHit
=2.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.8
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
Guide Point mu
ProbabilityofHit
=1.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.57
0.575
0.58
0.585
0.59
0.595
0.6
0.605
0.61
0.615
Guide Point mu
ProbabilityofHit
=0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.45
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
Guide Point mu
ProbabilityofHit
=0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Guide Point mu
ProbabilityofHit
=0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Guide Point mu
ProbabilityofHit
=0.1
Evolution of PIP During Homing for
Minimized Probability of Error
Centered PIP Broadened PIP
Begin Object
Commit
Object
Committed To
Object
Committed To
Centered PIP
49. Cortical Model
...
...
...
...
...
...
...
Y1
Y2 Y3 Ym
X1 X2 Xn
...
Pyramidal neurons (Gibbs Sampler) elements
Cortices consist of a collection of pyramidal neurons whose properties
are now well established
•Contains m pyramidal neurons
and has n inputs
•Full connectivity is unnecessary
and in general is sparse
50. Cortical Hamiltonian
1
1 1
m
i
i
n m
i ij j ik k i i
j k
H H
H x y y
Since energy is an extensive property of a system, the cortical
Hamiltonian is just the sum of the constituent single-neuron
Hamiltonians:
Network must then have a well-defined Boltzmann distribution:
exp ( , )
( , )
H x y
p x y
Z
By way of analogy with statistical physics, all network
macroscopic properties are defined including Free Energies,
entropies such as H(X,Y), and most importantly, mutual
information.
51. Cortical Circuit WTA Computation
...
...
...
...
...
...
...
• K credible targets with N = 1 true but unknown target
• M bullets (KVs)
• Neurons or Gibbs Sampling Elements
Y1
Y2 Y3 YM
X1
X2
XK
• Network determines optimal p(y|x) to
min I(X;Y) subj. to constraints
• Connectivity requirements driven by
need to be able to stabilize to
asymptotic distribution
• Output is probabilistic assignment
which is desirable from guidance
perspective
• Local nodes are identical IT-neurons
where local=global objective function
optimization
• Network has partition function and
likely critical temperatures
• Gibbs () and Helmholtz () Free
energies ( Saddle Points)
• Network Hamiltonian:
Discrimination Information p(xk)
1 1 1
K M M
T T
ik ik ik ik ik ik ik ik
k i i
H y y y
λ x ν y
53. Example of the Implication of
Assertions
Define the subjective inquiry B “Is it a Boy?”
Then let b “It is a Boy!” and s “It Is My Son!”
s b =
=
If Asserted … Then Known …
• If “It is My Son!” is Asserted, then this Additional Information is
Erased by B
• Implication means that if b answers the question B, then so does s
Holds Only Relative
to Question B
54. Theory Summary
• Distinguishability gives rise to
Boolean Algebras
– Exhaustion and Mutual
Exclusion Together Yield
Complementarity
• Logical implication is the
natural relational operator for
questions and assertions
• Probability and Entropy are
the corresponding natural
subjective measures of
degrees of implication
“Nothing”
Assertions Questions
Probability
Theory
Information
Theory
Intelligent
Systems
System Distinguishes
Boolean
Algebra and
Logical
Implication
Induce Computational Rules and Natural
Measures of Probability and Entropy