Optimal Control: Perspectives from the Variational Principles of Mechanics

Optimal Control
Perspectives from the Variational Principles of Mechanics

Ismail Hameduddin

Purdue University

Abstract
Optimal control is a tremendously important (and popular) area of
research in modern control engineering. The extraordinary elegance of
optimal control results, the significance of their implications and the un-
resolved nature of their practical implementation have excited the minds
of generations of engineers and mathematicians. The sheer amount of
recent research dedicated to the topic, even after more than five decades
of the first publication of results, are a testament to this. Despite this
widespread interest, an appreciation of the philosophical origins of opti-
mal control, rooted in analytical mechanics, is still lacking. By weaving-in
analogies from the variational principles of mechanics in the wider context
of an overview of optimal control theory, this work attempts to expose the
deeper connections between optimal control and the early, philosophically
oriented results in analytical mechanics. Rather than as a dry, rigorous
exercise, this is often done through more intellectually satisfying heuristic
discussions and insights. Although the two-point boundary value problem
is given due importance (with its parallel in analytical mechanics), special
emphasis is placed on the feedback form of optimal control (Hamilton-
Jacobi-Bellman equation) since this ties in closely with the exceedingly
beautiful Hamilton-Jacobi theory. Numerical solutions to the optimal con-
trol problem and in particular, the generalized Hamilton-Jacobi-Bellman
equation with successive Galerkin approximations, are also discussed to
highlight recent trends and motivations behind optimal control research.

1 Introduction
Optimal control is the area of study that deals with choosing free parameters in
a set of differential equations such that a cost function is minimized over an evo-
lution of time. Optimal control is an extremely important field with applications
ranging from engineering, operations research to finance and economics [19, 25].
For instance, the same tool used to study dynamical systems in economic theory
was used to design the controllers on the Apollo spacecraft [15].
Much of the development of optimal control mirrors that of analytical me-
chanics. From a philosophical point of view, optimal control is a mimicry of
nature. By the principle of least action, nature choses the motion of a system
(or particle) such as to minimize a certain form of “energy”. Then from the
point of view of nature, it “uses optimal control” to minimize the energy used
by systems in their motion. Optimal control is simply the turning of the tables
so that this tool is available in controlling the behavior of dynamical systems in
an optimal manner (with respect to a cost) subject to the (dynamic) constraints
already imposed by nature.
This report introduces the ideas of optimal control to an audience familiar
with analytical mechanics and variational principles. The intent is to provide a
basic understanding of the fundamental results and then delve into some more
advanced/recent results. The report can also be seen broadly in a chronological
manner: It starts with a short review of some basic results calculus of varia-
tions (1700-1900), then proceeds to optimal control theory (1950-1970), which
is followed by a discussion of the generalized Hamilton-Jacobi-Bellman equation
(1979) and finally, the paper is capped off by a discussion of a numerical scheme
developed in the 1990’s.
An effort has been made in the presentation to make the material relevant
and intellectually stimulating by establishing connections between classical an-
alytical mechanics and optimal control.

2 History
Optimal control is an outgrowth of the variational principles of mechanics and
it is difficult to pinpoint exactly when a transition was made from examining
systems moving freely under their own influence to determining a reference
control for a system to achieve a certain objective while minimizing a cost
function. A popular choice is the formulation of the brachistochrone problem:
Given two fixed points in a vertical plane, let a particle start from
rest at the higher point and travel to the lower point under its own
weight in a uniform gravity field. What path or curve must the
particle follow in order to reach the second point in the shortest
amount of time?
An obvious solution to the minimum length problem is the straight line between
both points. However, the straight line does not minimize the amount of time.

3

The correct solution is a cycloid between the two points A and B. This problem
was first proposed by Galileo in 1638 in his book Two New Science. Galileo
accompanied the problem with an incorrect solution based on the geometry of
the problem. Instead of a cycloid, he suggested a circle through the two points
and center located a certain distance away (on an axis) [26, 28].
Nearly sixty years later, oblivious to Galileo’s introduction of the problem,
Johann Bernoulli proposed the following “challenge” in the June 1696 issue of
Acta Eruditorum [28]:
If in a vertical plane two points A and B are given, then it is required
to specify the orbit AM B of the moveable point M , along which it,
starting from A, and under the influence of its own weight, arrives at
B in the shortest possible time. So that those who are keen of such
matters will be tempted to solve this problem, is it good to know that
it is not, as it may seem, purely speculative and without practical
use. Rather it even appears, and this may be hard to believe, that
it is very useful also for other branches of science than mechanics
In order to avoid a hasty conclusion, it should be remarked that
the straight line is certainly the line of shortest distance between A
and B, but it is not the one which is travelled in the shortest time.
However, the curve AM B - which I shall divulge if by the end of this
year nobody else has found it - is very well known among geometers.

This problem is precisely a minimum-time optimal control problem. Five math-
ematicians solved the brachistochrone problem including Johan Bernoulli him-
self, Leibniz, de l’Hopital, Jakob Bernoulli (Johan’s brother) and Isaac Newton.
Jakob Bernoulli formulated a more difficult version of the brachistochrone prob-
lem and solved it using a different type of proof. Jakob Bernoulli was mocked
by his brother [28, 26] for using a sloppy proof but that proof formed the foun-
dation of the future calculus of variations and the work of Lagrange, Hamilton
and Jacobi.
From the brachisochrone problem to the development of control, the history
of optimal control closely parallels that of analytical mechanics (variational prin-
ciples of mechanics). Kalman’s work in introducing the state-space architecture
to control revolutionized developments and reopened the door for significant
developments in optimal control [18].
Two schools of optimal control developed during the 1950’s and 1960’s. The
first was led by Richard E. Bellman and was centered in the USA. Bellman was
a mathematician and worked as a research scientist at The RAND Corporation
in Santa Monica, California [7]. His research was focused on optimizing the
control of multistage (discrete) systems [4, 6]. Two years after joining RAND
from Princeton, Bellman published his first book “The Theory of Dynamic Pro-
gramming” [5]. His development led to the Bellman equation which provides
sufficient conditions for optimality. Later this was generalized to continous-time
systems where it bore a striking similarity to the Hamilton-Jacobi equation of
analytical mechanics. In fact, both equations derive from the same principle

4

of minimizing an (integral) performance index subject to nonholonomic con-
straints. Thus, the continous-time version of the Bellman equation is known as
the Hamilton-Jacobi-Bellman equation [8]. The derivations in this paper will
focus on the Hamilton-Jacobi-Bellman formulation.
The other school of optimal control was centered in the USSR and led by the
acclaimed Soviet mathematician Lev Semenovich Pontryagin. Pontryagin devel-
oped his famous maximum principle at around roughly the same time as Bell-
man [22] but his work was, until later, available only in Russian [23]. Pontryagin
approach the problem of optimal control from the more classical approach of
the calculus of variations. The famous Pontryagin’s minimum principle gener-
alized necessary conditions for optimality and it was shown that the standard
Euler-Lagrange equations are simply a special case of this principle [8].
Ever since these theoretical foundations were laid in optimal control, much of
the development has been focused on applications and numerical techniques [18].
Even half a century after the solution of the optimal control problem was first
formulated, efficient numerical methods for the computation of these solutions
are still an active area of research. In general, the problem remains unresolved
since there is no efficient numerical scheme applicable in all cases even with the
exponentially larger computational resources available today versus five decades
ago.

3 The Optimal Control Problem
Consider a nonlinear time-varying dynamical system described by the equations

˙
x(t) = f (x(t), u(t), t); x(t0 ); t0 ≤ t ≤ tf (1)

where x(t) ∈ Rn is the vector of internal states and u(t) ∈ Rm is the vector
of control input. Suppose we are given an objective to drive the dynamical
system from some initial state x(t0 ) at initial time t = t0 to some specified final
state x(tf ) at final time t = tf given freedom over the assigned control input
u(t). In general, there are an infinite number of u(t) that satisfy this objective.
The goal of optimal control is to determine a u(t) that not only achieves the
objective but is also optimal with respect to a specified performance index or
cost. The performance index is chosen by the designer and therefore, the optimal
control u∗ (t) is not optimal in the universal sense but only with respect to the
performance index.
A general performance index is given by
tf
J = φ(x(tf ), tf ) + L(x(t), u(t), t) dt (2)
t0

where the L(x(t), u(t), t) is the weighting function and φ(x(tf ), tf ) is the final-
state weighting function. The final-state weighting function is a function that
we desire to minimize at the final state. An example of this might be the final
energy. The weighting function, on the other hand, is a function that we desire

5

to minimize throughout the time interval [t0 , tf ]. The weighting function is
commonly a function of the control input u(t). This is because we often want to
minimize the control “effort” expended to achieve the control objective. During
the reorientation of a spacecraft, for example, minimizing the control input u(t)
over the entire interval reduces the amount of valuable fuel consumed.
The control objective may be stated not only directly in terms of the final
state x(tf ) but may be function of the final state and time. This function is
called the final state constraint and is given by

Ψ(x(tf ), tf ) = 0 (3)

where Ψ ∈ Rp . From henceforth Ψ(x(tf ), tf ) will be treated as the control ob-
jective. Since this is a control objective, it differs from the final-state weighting
function φ(x(tf ), tf ) in that φ(x(tf ), tf ) only needs to be minimized at the final
time while Ψ(x(tf ), tf ) = 0 is a strict condition that must be met by the control
input u(tf ) at the final time.
The optimal control problem maybe pictured to be the problem of finding an
optimal path from an initial point to a final surface described by Ψ(x(tf ), tf ) =
0. Consider the case where we have x ∈ R2 . The optimal control problem is
then to find an optimal path from a point in R3 , i.e. (x(t0 ), t0 ), to the family of
points satisfying Ψ(x(tf ), tf ) = 0. Now if we have a fixed final time and fixed
end state, this family points is restricted to a single point. Otherwise, if the
final time is fixed but the final states are a function, we have a line. If we have
a free final time (as in a minimum time problem) and final states as a function,
we have a surface. This type of visualization is handy tool when dealing with
optimal control problems.
The next section begins a discussion of a basic result from the calculus of
variations. This is then used to develop a solution to the optimal control problem
presented here.

4 Variation with Auxiliary Conditions
It is instructive to first consider the problem of minimizing an integral
tf
I= ˙
F (q, q, t) dt (4)
t0

where q ∈ Rn , subject to the constraints

φ(q, t) = 0. (5)

where φ ∈ Rm . What will follow is a derivation from the calculus variations.
The parallels with optimal control will become clear in the next section.
For an unconstrained problem, it is sufficient that the integral (4) be station-
ary, i.e., the variation of I vanish, for the minimum, assuming that the second

6

variation ensures a minimum (this is not required for problems of dynamics).
Thus, we require
tf
δI = δ ˙
F (q, q, t) dt = 0. (6)
t0

This is not correct for integrals with constraints as above since, although we are
taking variations of all n generalized coordinate, we only have n − m degrees of
freedom. Thus in essence, we are only allowed to take free variations of n − m
generalized coordinates.
We use what is known as the “Lagrange Multiplier Method” to deal with
such a problem. Taking a variation of the constraint vector, we have
∂φ1
δφ = δq = 0. (7)
∂q
Multiplying the variation of the constraint vector by a time-dependent function
vector λT (t) and integrating with respect to time (between t0 and tf ) gives a
scalar term
tf tf
∂φ
λT (t)δφ dt = λT (t) δq dt = 0. (8)
t0 t0 ∂q
which can be augmented to (6) without changing the result since we are simply
adding zero
tf
δI = δ F (q, q, t) + λT (t)δφ dt = 0.
˙ (9)
t0

We can collect terms in δq in the first term of (9) to give
tf tf
δ F dt = δ E T δq dt. (10)
t0 t0

Thus from (9) and (10), we can write δI entirely in terms of the integrals of
terms affine in the δq. The original problem of eliminating m generalized co-
ordinates from the system now becomes straightforward. We choose suitable
λi such that the coefficients of m generalized coordinates vanish. The station-
arity condition still holds on the remaining independent δq and hence, by the
Euler-Lagrange equations, we need
∂F d ∂F ∂φ
− + λT (t) = 0. (11)
∂q ˙
dt ∂ q ∂q
Alternatively, we can achieve the same results by defining an augmented
function F as
F = F + λT (t)δφ (12)
and thus, similar to previously, we have
tf tf
I = F dt = F + λT (t)δφ dt. (13)
t0 t0

7

Setting δI = 0 with an appropriate λ(t) recovers the results (11).
For nonholonomic constraints
dφ = aT dq = 0. (14)
the result (11) still holds except that the partial derivatives ∂φ/∂q are replaced
by the coefficient a of the nonholonomic constraint vector, (14). We thus have
∂F d ∂F
− + λT (t)a. (15)
∂q ˙
dt ∂ q
A similar result for the optimal control problem using the same methods for
derivation is shown in the next section.

5 Optimal Control by the Euler-Lagrange Method
The approach of optimal control is to treat the problem of finding the optimal
control u(t) as one of finding the stationary value of the performance index
subject to nonholonomic constraints which are precisely the system dynamics.
In this philosophy, we are, in effect, turning the problem upside down. Rather
than approaching the system dynamics first and then finding a control that
would minimize a performance index, we approach the performance index first
and treat the system dynamics as auxiliary constraints on the system. It is this
simple, yet groundbreaking, change of perspective that spurred on the decades
of research and produced some of the most significant results of the past half
century. After this perspective change, the problem can be solved almost iden-
tically as in the previous section.
Consider first the case when there is no final state constraint but we have
fixed initial and final time. Begin by rearranging the system dynamics (1),
multiplying by an undetermined time-dependent vector λT (t) and integrating
between the limits to give
tf
λT (t) [f (x(t), u(t), t) − x(t)] dt = 0.
˙ (16)
t0

We can then augment the performance index (2) with (16) without any impact
since we are simply adding zero, similar to what we did in the general Lagrange
multiplier method
tf
J = φ(x(tf ), tf ) + L(x(t), u(t), t) + λT (t) [f (x(t), u(t), t) − x(t)]
˙ dt.
t0
(17)
As in analytical mechanics, define the Hamiltonian function as
H(x(t), u(t), λ(t), t) = L(x, u(t), t) + λT (t)f (x(t), u(t), t) (18)
which substituting in (17) yields
tf
J = φ(x(tf ), tf ) + H(x(t), u(t), λ(t), t) − λT (t)x(t) dt.
˙ (19)
t0

8

Integrating the last term of (19) by parts
tf tf
tf
λT (t)x(t) dt = λT (t)x
˙ + ˙
λT (t)x(t) dt. (20)
t0
t0 t0

Substituting (20) into (19) and evaluating the limits gives us

J = φ(x(tf ), tf ) − λT (tf )x(tf ) + λT (t0 )x(t0 )
tf
+ ˙
H(x(t), u(t), λ(t), t) + λT (t)x(t) dt. (21)
t0

We now consider a variations in J due to variations in the control vector
u(t) while holding the initial time t0 and final time tf fixed. After collecting
terms in the variation, we have
tf
∂φ ∂H ˙ ∂H
δJ = + λT δx + λT δx t=t0
+ + λT δx + δu dt.
∂x t=tf t0 ∂x ∂u
(22)
To achieve a stationary point δJ = 0, we choose the arbitrary multiplier func-
tions λ(t) such that the coefficients of the δx(t) vanish. This reduces the number
of free variables in our problem and we avoid the need to determine the varia-
tions δx(t) produced by a given δu(t). Hence, we first define the dynamics of
the multiplier functions as

˙ ∂H ∂L ∂f
λT (t) = − =− − λT (t) (23)
∂x ∂x ∂x
which eliminates the coefficient of δx inside the integral in (22). We also define
the boundary conditions on these dynamics as
∂φ
λT (tf ) = (24)
∂x(tf )

which eliminates the first term in (22). We then have
tf
∂H
δJ = λT (t0 )δx(t0 ) + δu dt. (25)
t0 ∂u

For J to be stationary, i.e., δJ = 0, we must have
∂H
=0 t0 ≤ t ≤ tf (26)
∂u
The above equations (23), (24) and (26) are precisely the conditions needed
for the performance index to be stationary, i.e., for u(t) to be an optimal control.
We are thus left to solve the following differential equations to determine the
optimal control:
˙
x = f (x, u, t) (27)

9

T T
˙ ∂f ∂L
λ=− λ− =0 (28)
∂x ∂x
where u(t) is determined by
T T
∂f ∂L
λ+ =0 (29)
∂u ∂u

and the boundary conditions are

x(t0 ) (30)
T
∂φ
λ(tf ) = (31)
∂x
The equations (27) through (31) parallel the Euler-Lagrange equations from
standard variational calculus and are referred to as the stationarity conditions.
Notice the similarity between (11) and (28),(29).
The multiplier vector elements λ are known as the “costates” because the
˙
optimal control is determined by solving the state dynamics x together with the
multiplier dynamics λ.˙
Since the boundary conditions are specified at both initial and final time, the
problem itself is often called the two-point boundary-value problem (2PBVP).
We are required to specify both the initial and final time for such a problem.
This restriction (of specifying both initial and final time) is overcome later by
using another method of solution of the optimal control problem that utilizes
elements from Hamilton-Jacobi theory.
An assumption of no final state constraint was assumed in the derivation
of the previous stationarity conditions. This is not true in many cases. The
problem where a final state vector

Ψ(x(tf ), tf ) = 0 (32)

is specified is dealt with below.
Analagous to the previous treatment, we form a performance index that is
augmented by a multiple of the final state constraint vector with the effect of
adding a multiple of zero

J = φ(x(tf ), tf ) + ν T Ψ(x(tf ), tf )
tf
L(x(t), u(t), t) + λT (t) [f (x(t), u(t), t) − x(t)]
˙ dt. (33)
t0

where ν T is a vector of undetermined multipliers. The previous derivation may
be repeated if we define
Φ = φ + νT Ψ (34)

10

and substitute into the performance index except that the ν T will not be spec-
ified. This can be resolved with some incremental effort, and the previous
stationarity conditions can be shown to hold with a minor modification to (31)

∂φ ∂ψ
λ(tf ) = + νT . (35)
∂x ∂x t=tf

This completes our discussion of optimal control by the Euler-Lagrange method.
Although, many further extensions to the current results exists, they are not
treated in this report.
Another approach to solving the optimal control problem is to use paral-
lels from the theory of Hamilton-Jacobi from analytical mechanics. Thus, a
short review of the Hamilton-Jacobi theory is given in the next section with an
emphasis on parts of the theory that prove useful in optimal control.

6 Hamilton-Jacobi Theory
Hamilton’s problem deals with solving for the motion of a dynamic system such
that its generalized coordinates are reduced to quadratures. According to the
principle of least action, the motion of a dynamic system or the solution of
Hamilton;s problem is such that it minimizes the total energy or “action”. By
Hamilton’s principle, this “action” is the canonical integral. Thus achieving a
stationary point on the canonical integral implies that a minimum energy motion
has been achieved and Hamilton’s problem has been solved. The stationary
point is not verified via a second variation because in general, for problems
in dynamics, a stationary point cannot imply a maximum (since the feasible
generalized coordinates are theoretically unbounded). Only a basic discussion
of this problem and its solution will be presented in this section as a complete
derivation is beyond the scope of the report. The reader is referred to references
[16, 13, 21] for more details.
The canonical integral in analytical mechanics is given by
tf
I= ˙ ˙
L(q, q, t) dt = I(q0 , q0 , t0 , tf ) (36)
t0

where L is the Lagrangian, q is the generalized coordinate vector, q is the˙
˙
generalized velocity vector and q0 , q0 are the vectors of initial conditions. For
a stationary point, the first variation of the canonical integral must be zero

δI = 0. (37)

A motion that satisfies such a condition is achieved in the Hamilton-Jacobi the-
ory via a canonical transformation, i.e., a transformation that does not violate
Hamilton’s principle in the dynamics of the system.
The statement of (36) is that the canonical integral, including integration
constants, is fully determined once we have the initial generalized coordinates

11

and velocities. Hamilton-Jacobi theory (which will not be derived here) intro-
duces a generating function S called “Hamilton’s Principal Function” based on
the canonical integral formulation in (36)
tf
S(q0 , qf , t0 , tf ) = L dt (38)
t0

where qf are the generalized coordinates at the final time t = tf . The key
difference between (36) and (38) is that we do not require the initial generalized
velocities but we instead replace these, via a canonical transformation, by the
generalized coordinates at the final time. In analytical mechanics finding such
a transformation implies that we have found a complete solution of Hamilton’s
problem. This is because we transform the system from a moving point in
configuration space to a fixed point. It is natural, therefore, that Hamilton’s
Principle Function holds a special importance in analytical mechanics (and by
extension, the Hamilton-Jacobi theory and optimal control theory).
By the theory of Hamilton-Jacobi, the principal function is the solution of the
following partial differential equation known as the Hamilton-Jacobi equation

∂S ∂S
+ H q, ,t =0 (39)
∂t ∂q

where H is the Hamiltonian (defined in terms of analytical mechanics). Once
the solution to the Hamilton-Jacobi equation is found (S), we can generate
a canonical transformation that transforms the moving point in configuration
space representing the motion of system to a fixed point in configuration space.
In the special case where the Hamiltonian is not dependent on time (conser-
vative systems), we have
∂S
H q, , t = 0. (40)
∂q
The results of this section will be exploited later, at the end of the next section,
to find an elegant solution to the optimal control problem. First, however, a
basic derivation of this result for the optimal control problem, not drawing on
the analogy from analytical mechanics, is presented in the next section.

7 Optimal Feedback Control via the Hamilton-
Jacobi-Bellman formulation
The problem of finding an optimal control u∗ (t) to proceed from a specified
initial state x(t0 ) to a terminal surface described by Ψ(x(tf ), tf ) = 0 has been
considered so far. A result was derived (Euler-Lagrange optimal control) to
determine the optimal control that minimizes the performance index
tf
J = φ(x(tf ), tf ) + L(x(t), u(t), t) dt (41)
t0

12

and satisfies the final-state constraint (or terminal surface)

Ψ(x(tf ), tf ) = 0 (42)

where the system dynamics are given by

˙
x(t) = f (x(t), u(t), t); x(t0 ); t0 ≤ t ≤ tf (43)

Implicit in this discussion was that if the initial state x(t0 ) was changed and
selected on the path from the initial point to the terminal surface determined
by optimal control, then the resulting (new) optimal path would lie on the same
path as previously except for beginning at the new initial state. In a significant
omission, the possibility of other completely arbitrary initial states that do not
lie on the original optimal path was not considered. Indeed, according to the
previous discussion, if another initial state that does not lie on the original
path is specified, then the optimal problem must be considered anew and the
optimal control Euler-Lagrange equations must be solved anew. Since in reality
an infinite number of initial conditions exist, if an efficient method for solving
the optimal control Euler-Lagrange equations is not available (and often it is
not), the previous optimal control results do not prove very useful. The optimal
control Euler-Lagrange equations provide an open-loop or feedforward control
that do not require the system state information at any time other than the
initial and final time (hence the name: two-point boundary-value problem).
It is preferred to have a family of paths that reach the terminal surface
Ψ(x(tf ), tf ) = 0 from a family of arbitrary initial states x(t0 ). Each of these
paths is the optimal path, with respect to the performance index, from the initial
state to the terminal surface. Thus, the family of paths is a family of optimal
paths or extremals which, in a continuous setting, should be representable by an
initial state dependent function. This allows the formation of feedback control
law rather than the feedforward type control provided by the Euler-Lagrange
formulation.
The most obvious strategy for forming this initial state dependent function
is to use the only two properties possessed by all the optimal paths: each path
is optimal with respect to the performance index and each path ends at the
terminal surface Ψ(x(tf ), tf ) = 0. Consider then, the cost of an optimal path
starting from an arbitrary initial state (initial state x at time t) and ending
at the terminal surface. This function is called the value function or optimal
return function and is given by
tf
V (x, t) = minu(t) φ(x(tf ), tf ) + L(x(τ ), u(τ ), τ ) dτ (44)
t

with boundary condition
V (x, t) = φ(x(t), t). (45)
on the terminal surface Ψ(x(t), t) = 0. For considerations here, we assume that
value function V (x, t) ∈ C 2 over the interval of interest. The qualifier minu(t)
implies that the evaluation of the value function is along the optimal trajectory.

13

A complete derivation of Hamilton-Jacobi-Bellman equation is shown below,
after which another heuristic derivation will be shown using parallels from the
Hamilton-Jacobi theory of analytical mechanics.
Suppose that the system starts at an arbitrary initial condition (x, t) and
proceeds using a non-optimal control u(t) for a short period of time ∆t to reach
the point (by first-order approximation assuming ∆t is sufficiently small)

˙
(x + x∆t, t + ∆t) = (x + f (x, u, t)∆t, t + ∆t). (46)

Correspondingly, by another first-order approximation, the value function for
this small non-optimal path is given by

dV (x, t)
V∆ (x, t) = ∆t = L(x, u, t)∆t (47)
dt
where the subscript on V signifies a first-order approximation of a small-path
and the tilde represents the non-optimal nature of the path.
Now suppose optimal control is used for the remainder of the path, i.e., from
(x + f (x, u, t)∆t, t + ∆t) to the terminal surface Ψ(x(tf ), tf ) = 0. The (subopti-
mal) total value function V (x, t) then is the sum of the (optimal) value function
beginning at initial state (x + f (x, u, t)∆t, t + ∆t) and the first-order approx-
imation to the value function of the small non-optimal path in the beginning
V∆ (x, t):

V (x, t) = V (x + f (x, u, t)∆t, t + ∆t) + V∆ (x, t) (48)
= V (x + f (x, u, t)∆t, t + ∆t) + L(x, u, t)∆t. (49)

Obviously, since V (x, t) is suboptimal (due to the small suboptimal path in the
beginning), it will always be greater than the actual (optimal) return function
V (x, t)
V (x, t) ≤ V (x, t). (50)
The equality will only hold in (50) when the optimal control is chosen for the
interval ∆t, i.e., when V (x, t) is minimized, from which we have

V (x, t) = minu {V (x + f (x, u, t)∆t, t + ∆t) + L(x, u, t)∆t} . (51)

Due to the assumption V (x, t) ∈ C 2 , the right-hand side of (51) can be
expanded as a Taylor series about (x, t)

∂V ∂V
V (x, t) = minu V (x, t) + f (x, u, t)∆t + ∆t + L(x, u, t)∆t . (52)
∂x ∂t
∂V
Since V and ∂t do not explicitly depend on u, setting ∆t → dt in (52) gives

∂V ∂V
− = minu L(x, u, t) + f (x, u, t) . (53)
∂t ∂x

14

Now consider the differential (with respect to time) of the value function
written in terms of the Hamiltonian analagous to (19)

dV = λT dx − H dt (54)

where
H(x, λ, u, t) = L(x, u, t) + λT f (x, u, t). (55)
From (54), we have on the optimal trajectory

∂V
λT = (56)
∂x
and
∂V
H=− . (57)
∂t
Substituting (56) into (55) gives

∂V
H(x, λ, u, t) = L(x, u, t) + f (x, u, t). (58)
∂x
which, when substituted into (53), gives the Hamilton-Jacobi-Bellman Equation

∂V ∂V
− = minu H x, , u, t . (59)
∂t ∂x

which is solved with the boundary condition

V (x, t) = φ(x(t), t) (60)

on the terminal surface Ψ(x, t) = 0. Solving the Hamilton-Jacobi-Bellman
(HJB) equation gives us the V (x, t), which we can use along with the speci-
fied performance index and the stationarity condition to determine the optimal
control u(x, t) independent of the initial state. Since the HJB equation is a suf-
ficient condition for optimality, we thus have a function that provides optimal
control in feedback form.

7.1 The Hamilton-Jacobi-Bellman equation from the stand-
point of analytical mechanics
We can perform a heuristic derivation of the HJB equation by appealing to
the Hamilton-Jacobi theory of analytical mechanics which shows the parallels
between optimal control theory and the variational princples of mechanics.
Recall that we defined Hamilton’s principal function (38) as the canonical
integral transformed such that it is a function of the generalized coordinates at
the final time rather than the generalized velocities, i.e.,

S = S(q0 , qf , t0 , tf ). (61)

15

˙
Substitute x = f (x, u, t) into the constrained performance index (19) and
let the initial states and control be arbitrarily assigned
tf
J = φ(xf , tf ) + H(x(t), u(t), λ(t), t) − λT (t)x(t) dt.
˙ (62)
t0
= J (x, xf , u, uf , t0 , tf ) (63)

where the subscript f indicates evaluation at the final time.
Now that since J = J (x0 , xf , u0 , uf , t0 , tf ) is not a function of the velocities
˙
x and because φ(xf , tf ) is simply a function evaluated at a single point, i.e. a
constant, defining x and u as an extended system of generalized coordinates,
allows us to set
S = J (x0 , x, u0 , u, t0 , tf ) (64)
Then the new S function is stationary with respect to the first variation if
it satisfies the Hamilton-Jacobi equation (39). Rearranging (39) and changing
the arguments we have
∂S ∂S
= −H x, , u, t (65)
∂t ∂x

which is simply another statement of the HJB equation (59) since by the Hamilton-
Jacobi theory S satisfying the previous partial differential equation immediately
implies that the first variation of the canonical integral (in this case, the per-
formance index) vanishes.

7.2 A Special Case
A special case is discussed here that utilizes the previous results to show an
example of deriving a feedback optimal control u∗ based on the HJB equation.
Specifically, consider a nonlinear system of the form

˙
x = f (x) + g(x)u (66)

where x ∈ Rn (as before), f : Rn → Rn , g : Rn → Rn×m , f (0) = 0 and u is a
control to be determined.
Let the value function (from the corresponding performance index) be given
by
∞
V (x, u) = xT Qx + uT Ru dt (67)
t
∞
= L(x, u) dt (68)
t

where Q ∈ Rn×n and R ∈ Rm×m are symmetric weighting matrices whose
choice is left as a design consideration. The expression in (67) evaluates the
total cost up to tf = ∞. It represents the weighted (by Q and R) squared
sum of the total control effort and state “effort” expended, which is commonly

16

a quantity that needs to be minimized. There are no final state constraints
specified and therefore, the problem is simply one of regulation, i.e., the system
˙
must be driven to its equilibrium x = 0. Furthermore, there is no final-state
weighting function. Also, notice that the value function (67) is not dependent
on time because the original system is not dependent on time. This property
will play an important role in the following discussion.
Similar to the development in (16) through (19), an augmentation of (67)
with the system dynamics multiplied by the costates yield
∞
V (x, u) = H(x, u, λ) − λT x dt
˙ (69)
t

where
H = xT Qx + uT Ru + λT [f (x) + g(x)u] . (70)
Rewriting the stationarity condition (29) in terms of the new system equations
gives
∂ ∂H
λT (f (x) + g(x)u) + L = =0 (71)
∂u ∂u
and hence from (70)
∂H
= 2uT R + λT g(x) = 0 (72)
∂u
where it must be noted that the costate λ is not arbitrary and satisfying (72)
implies that λ is a costate of the optimal control u∗ . We denote this special
costate λ∗ . For purposes of clarity, the expression (72) is transposed and then
rewritten to reflect this
∂H
= 2Ru∗ + g T (x)λ∗ = 0. (73)
∂u∗
Rearranging (73) gives an expression for the optimal control
1
u∗ = − R−1 g T (x)λ∗ (74)
2
where everything on the right-hand side is known except the “optimal costate”
λ∗ . This is precisely where the HJB equation enters the picture. Since by (56),
we have on the optimal trajectory
T
∂V
λ∗ = (75)
∂x

the expression for the optimal control (74) can be written as
T
1 ∂V
u∗ = − R−1 g T (x) (76)
2 ∂x

and hence finding the solution to the HJB equation (which gives V ) allows the
explicit analytic expression of the optimal control u∗ .

17

Notice that since the system under consideration is conservative, i.e., f =
f (x) and g = g(x), the Hamiltonian (70) is not dependent on time

H = H(x, u, λ) (77)

and furthermore, the value function (69) is also not dependent on time

V = V (x, u). (78)

Therefore, we have
∂V
=0 (79)
∂t
which implies that the HJB equation (60) reduces to

∂V
minu H x, , u, t =0 (80)
∂x

over the optimal trajectory. From the expression of the Hamiltonian H (70)

H = minu xT Qx + uT Ru + λT [f (x) + g(x)u] (81)
∂V
= xT Qx + u∗T Ru∗ + [f (x) + g(x)u∗ ] = 0 (82)
∂x
which was obtained by using the relationship (75).
Substituting the optimal control (63) into the modified HJB (75) yields the
partial differential equation

1 ∂V 1
xT Qx+ (R−1 g T (x)λ∗ )T R(R−1 g T (x)λ∗ )+ f (x) − g(x)(R−1 g T (x)λ∗ ) = 0
4 ∂x 2
(83)
or by simplifying and using (75)

∂V 1 ∂V ∂V
f (x) + xT Qx − g(x)R−1 g T (x) = 0. (84)
∂x 4 ∂x ∂x
The only unknown in (84) is ∂V or the partial derivative (with respect to the
∂x
states) of the optimal return function/value function. Therefore solving (84) is
sufficient to determine the optimal control (74).
Unfortunately, solving the partial differential equation (84) is extremely dif-
ficult and frequently impossible. Thus, even though a feedback optimal control
based on the HJB equation, as in (74) is very attractive especially over the
feedforward Euler-Lagrange optimal control solution, the added complexity in
solving a partial differential equation such as (84) strictly limits its direct ap-
plication [8].
Although several techniques have been proposed to provide a solution to
the HJB equation under special conditions, the problem is still, even after five
decades, an active area of research. One such a technique is presented in the
next section in significant detail.

18

8 Generalized Hamilton-Jacobi Bellman Equa-
tion
Traditionally, the challenge of solving a partial differential equation like (84)
was tackled using what is known as the “method of characteristics” [8]. The
basic idea behind this method is to reduce the partial differential equation into
a family of ordinary differential equations which are then integrated over differ-
ent initial conditions to the terminal surface to obtain solutions to the partial
differential equation. Such a scheme is very useful in studying the qualitative
behavior of partial differential equations and has extensive applications in (com-
putational) fluid mechanics where it is used to study phenomena such as turbu-
lence and shockwaves via the Navier-Stokes equations. However, its application
in optimal control is not particularly beneficial. Firstly, the computation and
storage of solutions of (infinitely) large sets of ordinary differential equations
and initial conditions is prohibitive. In fact, this eliminates one of main reasons
of using the HJB solution to the optimal control problem; to avoid computa-
tion of arbitarily large numbers of solutions to the two-point boundary value
problem. Secondly, the solutions via the characteristic equations are not always
well-defined. Specifically, under certain conditions, multivalued solutions might
appear. Thirdly, in many cases, the method of characteristics does not cover the
entire domain of the partial differential equation and the solution only exists in
a weak sense. Despite these apparently critically shortcomings, during the early
years of optimal control, the method of characteristics was often considered the
only route to achieve a practical solution to the optimal control problem via the
HJB equation.
During the 1970’s, other more efficient techniques hinging on system linearity
were developed to solve the HJB equation to obtain a feedback optimal control.
If the system nonlinearities are small, peturbation methods can be used to
achieve second-order approximations to the optimal control as was shown in
[12, 20, 10, 11]. An explicit assumption in these is that the optimal control has
a sufficiently accurate second-order Taylor series expansion about the origin.
This type of assumption severly limits the class of systems to which the method
is applicable. The stability region of the resulting control is also almost always
impossible to determine. Perturbation methods, therefore, did not gain much
momentum as viable schemes for numerical feedback optimal control.
As feedback linearization (or dynamic inversion) and geometric control gained
popularity during the late 1980’s and 1990’s, several new attempts were made at
attacking the numerical feedback optimal control problem. All of these involved
canceling system nonlinearities via feedback (dynamic inversion) and then ap-
plying optimal control theory to the subsequent linearized system [14, 9, 27].
This method has several drawbacks: significant control effort is expended in
forcing the nonlinear system to behave linearly, useful nonlinearities that may
help in control are eliminated, the dynamic inversion of the control matrix is not
always a global transformation, the dynamic inversion itself is computationally
expensive and finally, the dynamic inversion is fragile to modeling uncertainties

19

and disturbances.
Another approach to utilizing the HJB equation for optimal feedback control
takles the problem not by determining an optimal control u∗ but rather by suc-
cessively optimizing an existing stabilizing suboptimal control u(0) . The method
utilizes an alternative formulation of the Hamilton-Jacobi equation known as the
generalized Hamilton-Jacobi-Bellman equation and was first proposed by Saridis
and Lee in [24]. The design methodology was further refined in [2, 17, 3] by
introducing the use of Galerkin’s spectral method to approximate partial dif-
ferential equations. The following is a detailed mathematical treatment of this
methodology using previously derived results in this report.
Consider a suboptimal stabilizing feedback control u(x) for the (conserva-
tive) nonlinear system (66). Analagous to (67), let the suboptimal value function
for this particular control be given by
∞
V (x) = xT Qx + uT (x)Ru(x) dt. (85)
t

We say that a feedback control u ∈ Ωu is admissible if u is continous and renders
(66) asymptotically stable.
Assuming an admissible but suboptimal u is given, can the HJB equation
be exploited to optimize this control successively over time? This question was
first addressed by Saridis and Lee in [24] where they introduced the concept
of the generalized Hamilton-Jacobi-Bellman equation. The equation was thus
named because it applied to all types of u and not just an optimal control. It
is introduced here based on previous results in a nonrigorous fashion.
Differentiating the suboptimal value function (85) along the trajectories of
the system yields the differential form of the (suboptimal) value function

∂V T
GHJB : [f (x) + g(x)u(x)] + xT Qx + uT (x)Ru(x) = 0. (86)
∂x
This differential form of the (suboptimal) value function is known as the gener-
alized Hamilton-Jacobi-Bellman (GHJB) equation. The solution of the GHJB
equation V is a Lyapunov function for (66) under the suboptimal control u [1].
It represents the value function under a suboptimal control.
The development below closely follows Saridis and Lee [24]. Key theorems
are reproduced (in a standardized form) and presented without the proofs. The
first lemma relates the suboptimal value function V (x) to the true value function
V (x) under optimal control.
Lemma 1 Assume the optimal control u∗ and the optimal value function V (x)
exist. Then these satisfy the GHJB equation (86) and

0 < V (x) ≤ V (x). (87)

The next theorem presents an approach to ensure a successively (at each step
or iteration) smaller suboptimal value function.

20

Theorem 1 If a sequence of pairs {u(i) , V (i) } satisfying the GHJB equation
(86) is generated by selecting the control u(i) to minimize the GHJB equation
associated with the previous value function V (i−1) , e.g.,

1 ∂ V (i−1)
u(i) = − g T (x) (88)
2 ∂x
then the corresponding value function satisfies the inequality

V (i) ≤ V (i−1) . (89)

Note the similarity between (88) and general expression for the optimal control
(76). The corollary that follows is intuitively immediate from Lemma 1 and
Theorem 1. It deals with the convergence of a sequence of suboptimal value
functions to the optimal value function given a control such as (88)

Corollary 1 By selecting pairs {u(i) , V (i) } with

1 ∂ V (i−1)
u(i) = − g T (x) (90)
2 ∂x

the resulting sequence {V (i) } converges monotonically to the optimal value func-
tion V (x) associated with the optimal control, i.e.,

V (0) ≥ V (1) ≥ V (2) ≥ . . . ≥ V. (91)

The final two theorems deal with construction of upper and lower bounds for
the true value function V (x). This is accomplished by obtaining functions that
only marginally do not satisfy the GHJB equation on both sides (< 0 and > 0).
Theorem 2 Suppose for a given us (x) and some

s(x), |s(x)| < ∞ (92)

there exists a continously differentiable positive definite function Vs = V (x, us )
satisfying the properties

∂ VsT
[f (x) + g(x)u(x)] + xT Qx + uT (x)Ru(x) = ∆Vs ≤ s(x) < 0 (93)
∂x

Then Vs (x) is an upper bound of the optimal value function V (x)

Vs (x) > V (x). (94)

21

And similarly for the lower bound, we have the last theorem.
Theorem 3 Suppose for a given us (x) and some

s(x), |s(x)| < ∞ (95)

there exists a continously differentiable positive definite function Vs = V (x, us )
satisfying the properties

∂ VsT
[f (x) + g(x)u(x)] + xT Qx + uT (x)Ru(x) = ∆Vs ≥ s(x) > 0 (96)
∂x

Then Vs (x) is a lower bound of the optimal value function V (x)

Vs (x) < V (x). (97)

An exact design procedure for optimizing an initial admissible control u(0) ∈
Ωu can now be formed from the previous results.
1. Select an initial admissible control u(0) ∈ Ωu for the system (66).

2. Solve the GHJB partial differential equation to find V (0)

∂ V (0)T
f (x) + g(x)u(0) (x) + xT Qx + u(0)T (x)Ru(0) (x) = 0. (98)
∂x

Then by Lemma 1, V (0) ≥ V .
3. Obtain an improved controller u(1) using Corollary 1

1 ∂ V (0)
u(1) = − g T (x) . (99)
2 ∂x

4. Solve the GHJB partial differential equation to find V (1)

∂ V (1)T
f (x) + g(x)u(1) (x) +xT Qx+u(1)T (x)Ru(1) (x) = 0. (100)
∂x

Then by Lemma 1, V (0) > V (1) ≥ V .
5. Determine a lower bound Vs to the optimal value function using Theorem
3.
6. Use V (1) − Vs as a measure to evaluate how close an approximation u(1)
is to the optimal control u∗ . If acceptable, stop at this iteration.
7. Otherwise, if the approximation is not acceptable, repeat from step 2
onwards with a new iteration.

22

The benefit of using the GHJB equation and the control design procedure
outlined is that we do not need to solve the HJB partial differential equation
equation (84) directly. Rather, a much more amenable partial differential equa-
tion needs to be solved in the form of the GHJB (86). Furthermore, the GHJB
allows for an iteratively improving solution that addresses several implementa-
tion challenges. Rather than have to solve the entire optimal control problem
at once, the solution is divided into successively improving iterations, each of
which is useful in the control action since each is always better than the initial
designed stabilizing controller.
A method to solve the GHJB equation is considered below.

9 Succesive Galerkin Approximation to the GHJB
Equation
The solution to the GHJB equation (86) needs to be numerically determined in
order to utilize the design procedure outlined above. This problem was tackled
by Beard in his doctoral work [2] and in the subsequent journal publication [3].
An algorithm called Succesive Galerkin Approximation (SGA) was developed
based on the spectral method of Galerkin. A numerically efficient version of the
algorithm was also developed in [17]. Most famously, a discussion of the method
by Beard, Saridis and Wen appeared in the IEEE Control Engineering Magazine
[1]. This section provides an outline of the method with its key points.
Let the system (66) be Lipschitz continous on a set Ω ⊂ Rn containing the
origin. Furthermore, let there exist a continous control on Ω that asymptotically
stabilizes the system, i.e., the system is controllable over Ω. Now assume the
existence of a set of basis functions {φj }∞ , where φj : Ω → Rn are continous,
1
φ(0) = 0 and span{φj }∞ ⊆ L2 (Ω). Then the solution V of the GHJB equation
1
(86) can be written as
∞
V (x) = cj φj (x)
ˆ (101)
j=1

where the cj are constants to be determined. It is not practical to have an
ˆ
infinite summation as an approximation, and thus a large enough number N is
chosen to truncate the solution. This truncated solution is referred to as VN
and from (101), it is given by

VN (x) = cT ΦN (x)
ˆN (102)

where
cT = c 1
ˆN ˆ ... cN
ˆ (103)
and
T
ΦN (x) = φ1 (x) . . . φN (x) (104)

23

ˆ
The vector of N constants cN is determined by ensuring orthogonality be-
tween the GHJB expressed in terms of VN (x) and ΦN (x), i.e.,

GHJB VN (x) , ΦN (x) =0 (105)
Ω

where ., . Ω denotes the function inner product (integral) over the set Ω. Note
that in (105), the expression (101) is used. It follows that (105) is system of
N linear equations with N unknows. The system can be easily inverted to
ˆ
determine cN as is shown in the following discussion.
The GHJB equation from (105) (in terms of the truncated approximation of
the suboptimal value function) is written as

T
∂ VN
[f (x) + g(x)u(x)] + xT Qx + uT (x)Ru(x)
∂x
∂ΦN (x)
= cT
ˆN [f (x) + g(x)u(x)] + xT Qx + uT (x)Ru(x) (106)
∂x
where ∂ΦN /∂x ∈ RN ×n is a matrix quantity. For convenience denote this as
T
∂ΦN (x) ∂φ1 (x) ∂φN (x)
= ΦN (x) = ... . (107)
∂x ∂x ∂x

Then from (106), it follows that the GHJB equation is

cT ΦN (x) [f (x) + g(x)u(x)] + xT Qx + uT (x)Ru(x) .
ˆN (108)

Transposing (108)
T
[f (x) + g(x)u(x)] ΦT (x)ˆN + xT Qx + uT (x)Ru(x) .
N c (109)

and then substituting into (109) yields
T
[f (x) + g(x)u(x)] ΦT (x), ΦN
N cN + xT Qx, ΦN
ˆ Ω
+ uT (x)Ru(x), ΦN Ω
= 0.
Ω
(110)
or

T
[f (x) + g(x)u(x)] ΦT (x)ΦN
N ˆ
cN + xT QxΦN + uT (x)Ru(x), ΦN
Ω Ω Ω
T
= [f (x) + g(x)u(x)] ΦT (x)ΦN
N ˆ
cN + xT Qx + uT (x)Ru(x) ΦN
Ω Ω
= aˆN + b = 0.
c (111)

where a ∈ R, cN ∈ RN and b ∈ RN . Thus cj maybe found element by element
ˆ ˆ
using
bj
cj = −
ˆ (112)
a

24

where bj is the j-th element of b. Once these are determined, (102) is used
to form the truncated approximation of the suboptimal value function. The
convergence and validity proofs for this type of approximation is dealt with in
[2].
The basis functions have not been discussed so far. Polynomials, in most
cases, are sufficient. Moreover, if these are orthogonal, better results are ex-
pected. Increasing the number of these basis functions, i.e., increasing N , has
an exponential effect on the computation required [17]. It is therefore, impor-
tant to choose the basis vectors carefully. Lawton and Beard showed in [17]
that choosing the basis functions such that they are separable and assuming the
domain Ω to be rectangular allows for the formulation of significantly compu-
tationally cheaper versions of the SGA algorithm. Polynomials are separable
functions and therefore play an important role in that work.
Despite the attractiveness of the methods presented, they still pose chal-
lenges when it comes to addressing one of the prime reasons for utilizing the
HJB equation for optimal control; to allow for a closed-form solution to the op-
timal feedback problem that can be used efficiently in realistic scenarios. In this
respect, the GHJB/SGA algorithm is not unique among the other methodolo-
gies in numerical optimal feedback control. As the system order increases and
computational resources become more restrictive, most methodologies become
infeasible. Thus, using such algorithms in embedded systems or to efficiently
control complex systems (like aircraft) is often impossible.

10 Conclusion
A broad discussion of optimal control was presented. A history and the basic
problem of optimal control were shown. This was followed by a derivation of
standard results in optimal control theory along with discussions of the connec-
tions between classical mechanics and optimal control theory. The report ended
with a discussion of more recent results in optimal control theory, namely, results
to make optimal control theory more practically viable.
Even half a century after the initial results published independently by Bell-
man and Pontryagin, optimal control remains a vibrant area of research with
much sought after results. Rather than recede to the background in light of
the latest developments, optimal control is becoming more and more relevant.
This is not least because of the huge strides achieved in computational power.
Mathematical developments and the race towards achieving computationally vi-
able schemes for simulation also indirectly benefit optimal control theory. With
its wide applications and promise for future research, optimal control remains
a high value research area. Since the theoretical foundation of optimal control
theory has already been laid, this high value research is geared towards achieving
numerical schemes to make optimal control more practical.

25

References
[1] R. Beard, G. Saridis, and J. Wen, “Improving the performance of stabilizing
controls for nonlinear systems,” Control Systems Magazine, IEEE, vol. 16,
no. 5, pp. 27–35, 1996.
[2] R. Beard, “Improving the closed-loop performance of nonlinear systems,”
Ph.D. dissertation, Rensselaer Polytechnic Institute, 1995.

[3] R. Beard, G. Saridis, and J. Wen, “Galerkin approximations of the gener-
alized Hamilton-Jacobi-Bellman equation* 1,” Automatica, vol. 33, no. 12,
pp. 2159–2177, 1997.
[4] R. Bellman, “On the theory of dynamic programming,” Proceedings of the
National Academy of Sciences of the United States of America, vol. 38,
no. 8, p. 716, 1952.
[5] ——, The theory of dynamic programming. Defense Technical Information
Center, 1954.
[6] ——, “An introduction to the theory of dynamic programming.” 1953.

[7] ——, Eye of the Hurricane: an Autobiography. World Scientiﬁc, 1984.
[8] A. Bryson and Y. Ho, Applied optimal control. American Institute of
Aeronautics and Astronautics, 1979.
[9] L. Gao, L. Chen, Y. Fan, and H. Ma, “A nonlinear control design for power
systems,” Automatica, vol. 28, no. 5, pp. 975–979, 1992.
[10] W. Garrard, “Suboptimal feedback control for nonlinear systems,” Auto-
matica, vol. 8, no. 2, pp. 219–221, 1972.
[11] W. Garrard and J. Jordan, “Design of nonlinear automatic ﬂight control
systems,” Automatica, vol. 13, no. 5, pp. 497–505, 1977.

[12] W. Garrard, N. McClamroch, and L. Clark, “An approach to sub-optimal
feedback control of non-linear systems,” International Journal of Control,
vol. 5, no. 5, pp. 425–435, 1967.
[13] H. Goldstein, C. Poole, J. Safko, and S. Addison, “Classical mechanics,”
American Journal of Physics, vol. 70, p. 782, 2002.
[14] A. Isidori, Nonlinear control systems. Springer Verlag, 1995.
[15] A. Klumpp, “Apollo lunar descent guidance,” Automatica, vol. 10, no. 2,
pp. 133–146, 1974.

[16] C. Lanczos, The variational principles of mechanics. Dover Publications,
1970.

26

[17] J. Lawton and R. Beard, “Numerically eﬃcient approximations to the
Hamilton-Jacobi-Bellman equation,” in American Control Conference,
1998. Proceedings of the 1998, vol. 1. IEEE, 1998, pp. 195–199.
[18] F. Lewis, Applied optimal control and estimation. Prentice Hall PTR,
1992.

[19] F. Lewis and V. Syrmos, Optimal control. Wiley-Interscience, 1995.
[20] Y. Nishikawa, N. Sannomiya, and H. Itakura, “A method for suboptimal
design of nonlinear feedback systems,” Automatica, vol. 7, no. 6, pp. 703–
712, 1971.

[21] J. Papastavridis and J. Papastavridis, Analytical Mechanics. Oxford Uni-
versity Press, 2002.
[22] L. Pontryagin, “Optimal regulation processes,” Uspekhi Matematicheskikh
Nauk, vol. 14, no. 1, pp. 3–20, 1959.

[23] L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko, The
mathematical theory of optimal control processes. Interscience, New York,
1962.
[24] G. Saridis and C. Lee, “An approximation theory of optimal control for
trainable manipulators,” Systems, Man and Cybernetics, IEEE Transac-
tions on, vol. 9, no. 3, pp. 152–159, 1979.
[25] S. Sethi and G. Thompson, Optimal control theory: applications to man-
agement science and economics. Springer Verlag, 2005.
[26] H. Sussmann and J. Willems, “300 years of optimal control: from the
brachystochrone to the maximum principle,” Control Systems Magazine,
IEEE, vol. 17, no. 3, pp. 32–44, 1997.
[27] Y. Wang, D. Hill, R. Middleton, and L. Gao, “Transient stabilization of
power systems with an adaptive control law* 1,” Automatica, vol. 30, no. 9,
pp. 1409–1413, 1994.

[28] J. Willems, “1696: the birth of optimal control,” in Decision and Control,
1996., Proceedings of the 35th IEEE, vol. 2. IEEE, 1996, pp. 1586–1587.

27

Optimal Control: Perspectives from the Variational Principles of Mechanics

Recommended

Recommended

More Related Content

Similar to Optimal Control: Perspectives from the Variational Principles of Mechanics

Similar to Optimal Control: Perspectives from the Variational Principles of Mechanics (20)

Recently uploaded

Recently uploaded (20)

Optimal Control: Perspectives from the Variational Principles of Mechanics