Optimization

3
The Gauge Integral

3.1 Introduction
Much of calculus deals with the interplay between differentiation and
integration. The antiquated term “antidifferentiation” emphasizes the fact
that differentiation and integration are inverses of one another. We will
take it for granted that readers are acquainted with the mechanics of in-
tegration. The current chapter develops just enough integration theory to
make our development of differentiation in Chap. 4 and the calculus of
variations in Chap. 17 respectable. It is only fair to warn readers that in
other chapters a few applications to probability and statistics will assume
familiarity with properties of the expectation operator not covered here.
The first successful effort to put integration on a rigorous basis was un-
dertaken by Riemann. In the early twentieth century, Lebesgue defined
a more sophisticated integral that addresses many of the limitations of
the Riemann integral. However, even Lebesgue’s integral has its defects.
In the past few decades, mathematicians such as Henstock and Kurzweil
have expanded the definition of integration on the real line to include
a wider variety of functions. The new integral emerging from these in-
vestigations is called the gauge integral or generalized Riemann integral
[7, 68, 108, 193, 250, 255, 278]. The gauge integral subsumes the Riemann
integral, the Lebesgue integral, and the improper integrals met in tradi-
tional advanced calculus courses. In contrast to the Lebesgue integral, the
integrands of the gauge integral are not necessarily absolutely integrable.

K. Lange, Optimization, Springer Texts in Statistics 95, 53
DOI 10.1007/978-1-4614-5838-8 3,
© Springer Science+Business Media New York 2013

54 3. The Gauge Integral

It would take us too far afield to develop the gauge integral in full
generality. Here we will rest content with proving some of its elementary
properties. One of the advantages of the gauge integral is that many theo-
rems hold with fewer qualifications. The fundamental theorem of calculus is
a case in point. The commonly stated version of the fundamental theorem
concerns a differentiable function f (x) on an interval [a, b]. As all students
of calculus know,
b
f (x) dx = f (b) − f (a).
a

Although this version is true for the gauge integral, it does not hold for the
Lebesgue integral because the mere fact that f (x) exists throughout [a, b]
does not guarantee that it is Lebesgue integrable.
This quick description of the gauge integral is not intended to imply that
the gauge integral is uniformly superior to the Lebesgue integral and its
extensions. Certainly, probability theory would be severely handicapped
without the full flexibility of modern measure theory. Furthermore, the ad-
vanced theory of the gauge integral is every bit as difficult as the advanced
theory of the Lebesgue integral. For pedagogical purposes, however, one can
argue that a student’s first exposure to the theory of integration should fea-
ture the gauge integral. As we shall see, many of the basic properties of
the gauge integral flow directly from its definition. As an added dividend,
gauge functions provide an alternative approach to some of the material of
Chap. 2.

3.2 Gauge Functions and δ-Fine Partitions
The gauge integral is defined through gauge functions. A gauge function
is nothing more than a positive function δ(t) defined on a finite interval
[a, b]. In approximating the integral of a function f (t) over [a, b] by a finite
Riemann sum, it is important to sample the function most heavily in those
regions where it changes most rapidly. Now by a Riemann sum we mean a
sum
n−1
S(f, π) = f (ti )(si+1 − si ),
i=0

where the mesh points a = s0 < s1 < · · · < sn = b form a partition π of
[a, b], and the tags ti are chosen so that ti ∈ [si , si+1 ]. If δ(ti ) measures the
rapidity of change of f (t) near ti , then it makes sense to take δ(t) small in
regions of rapid change and to force si and si+1 to belong to the interval
(ti − δ(ti ), ti + δ(ti )). A tagged partition with this property is called a δ-
fine partition. Our first proposition relieves our worry that δ-fine partitions
exist.

3.2 Gauge Functions and δ-Fine Partitions 55

Proposition 3.2.1 (Cousin’s Lemma) For every gauge δ(t) on a finite
interval [a, b] there is a δ-fine partition.

Proof: Assume that [a, b] lacks a δ-fine partition. Since we can construct a
δ-fine partition of [a, b] by appending a δ-fine partition of the half-interval
[(a + b)/2, b] to a δ-fine partition of the half-interval [a, (a + b)/2], it fol-
lows that either [a, (a + b)/2] or [(a + b)/2, b] lacks a δ-fine partition. As in
Example 2.3.1, we choose one of the half-intervals based on this failure and
continue bisecting. This creates a nested sequence of intervals [ai , bi ] con-
verging to a point x. If i is large enough, then [ai , bi ] ⊂ (x − δ(x), x + δ(x)),
and the interval [ai , bi ] with tag x is a δ-fine partition of itself. This con-
tradicts the choice of [ai , bi ] and the assumption that the original interval
[a, b] lacks a δ-fine partition.
Before launching into our treatment of the gauge integral, we pause to
gain some facility with gauge functions [108]. Here are three examples that
illustrate their value.
Example 3.2.1 A Gauge Proof of Weierstrass’ Theorem
Consider a real-valued continuous function f (t) with domain [a, b]. Suppose
that f (t) does not attain its supremum on [a, b]. Then for each t there exists
a point x ∈ [a, b] with f (t) < f (x). By continuity there exists δ(t) > 0 such
that f (y) < f (x) for all y ∈ [a, b] with |y − t| < δ(t). Using δ(t) as a
gauge, select a δ-fine partition a = s0 < s1 < · · · < sn = b with tags
ti ∈ [si , si+1 ] and designated points xi satisfying f (ti ) < f (xi ). Let xmax
be the point xi having the largest value f (xi ). Because xmax lies in some
interval [si , si+1 ], we have f (xmax ) < f (xi ). This contradiction discredits
our assumption that f (x) does not attain its supremum. A similar argument
applies to the infimum.

Example 3.2.2 A Gauge Proof of the Heine-Borel Theorem

One can use Cousin’s lemma to prove the Heine-Borel Theorem on the real
line [278]. This theorem states that if C is a compact set contained in the
union ∪α Oα of a collection of open sets Oα , then C is actually contained in
the union of a finite number of the Oα . Suppose C ⊂ [a, b]. Define a gauge
δ(t) so that the interval (t − δ(t), t + δ(t)) does not intersect C when t ∈ C
and (t − δ(t), t + δ(t)) is contained in some Oα when t ∈ C. Based on δ(t),
select a δ-fine partition a = s0 < s1 < · · · < sn = b with tags ti ∈ [si , si+1 ].
By definition C is contained in the union ∪ti ∈C Ui , where Ui is the set Oα
covering ti . The Heine-Borel theorem extends to compact sets in Rn .

Example 3.2.3 A Gauge Proof of the Intermediate Value Theorem

Under the assumption of the previous example, let c be an number strictly
between f (a) and f (b). If we assume that there is no t ∈ [a, b] with f (t) = c,
then there exists a positive number δ(t) such that either f (x) < c for all


x ∈ [a, b] with |x − t| < δ(t) or f (x) > c for all x ∈ [a, b] with |x − t| < δ(t).
We now select a δ-fine partition a = s0 < s1 < · · · < sn = b and observe
that throughout each interval [si , si+1 ] either f (t) < c or f (t) > c. If to
start f (s0 ) = f (a) < c, then f (s1 ) < c, which implies f (s2 ) < c and so
forth until we get to f (sn ) = f (b) < c. This contradicts the assumption
that c lies strictly between f (a) and f (b). With minor differences, the same
proof works when f (a) > c.
In preparation for our next example and for the fundamental theorem
of calculus later in this chapter, we must define derivatives. A real-valued
function f (t) defined on an interval [a, b] possesses a derivative f (c) at
c ∈ [a, b] provided the limit
f (t) − f (c)
lim = f (c) (3.1)
t→c t−c
exists. At the endpoints a and b, the limit is necessarily one sided. Tak-
ing a sequential view of convergence, definition (3.1) means that for every
sequence tm converging to c we must have
f (tm ) − f (c)
lim = f (c).
m→∞ tm − c
In calculus, we learn the following rules for computing derivatives:
Proposition 3.2.2 If f (t) and g(t) are differentiable functions on (a, b),
then

αf (t) + βg(t) = αf (t) + βg (t)

f (t)g(t) = f (t)g(t) + f (t)g (t)
1 f (t)
= − .
f (t) f (t)2
In the third formula we must assume f (t) = 0. Finally, if g(t) maps into
the domain of f (t), then the functional composition f ◦ g(t) has derivative
[f ◦ g(t)] = f ◦ g(t)g (t).
Proof: We will prove the above sum, product, quotient, and chain rules in
a broader context in Chap. 4. Our proofs will not rely on integration.

Example 3.2.4 Strictly Increasing Functions
Let f (t) be a differentiable function on [c, d] with strictly positive derivative.
We now show that f (t) is strictly increasing. For each t ∈ [c, d] there exists
δ(t) > 0 such that
f (x) − f (t)
> 0 (3.2)
x−t

3.3 Definition and Basic Properties of the Integral 57

for all x ∈ [a, b] with |x − t| < δ(t). According to Proposition 3.2.1, for any
two points a < b from [c, d], there exists a δ-fine partition

a = s0 < s1 < · · · < sn = b

of [a, b] with tags ti ∈ [si , si+1 ]. In view of inequality (3.2), at least one
of the two inequalities f (si ) ≤ f (ti ) ≤ f (si+1 ) must be strict. Thus, the
telescoping sum
n−1
f (b) − f (a) = [f (si+1 ) − f (si )]
i=0

must be positive.

3.3 Definition and Basic Properties of the Integral
With later applications in mind, it will be convenient to define the gauge
integral for vector-valued functions f (x) : [a, b] → Rn . In this context, f (x)
is said to have integral I if for every > 0 there exists a gauge δ(x) on
[a, b] such that

S(f, π) − I < (3.3)

for all δ-fine partitions π. Our first order of business is to check that the
integral is unique whenever it exists. Thus, suppose that the vector J is a
second possible value of the integral. Given > 0 choose gauges δI (x) and
δJ (x) leading to inequality (3.3). The minimum δ(x) = min{δI (x), δJ (x)}
is also a gauge, and any partition π that is δ-fine is also δI and δJ -fine.
Hence,

I −J ≤ I − S(f, π) + S(f, π) − J < 2 .

Since is arbitrary, J = I.
One can also define f (x) to be integrable if its Riemann sums are Cauchy
in an appropriate sense.
Proposition 3.3.1 (Cauchy criterion) A function f (x) : [a, b] → Rn is
integrable if and only if for every > 0 there exists a gauge δ(x) > 0 such
that

S(f, π1 ) − S(f, π2 ) < (3.4)

for any two δ-fine partitions π1 and π2 .
Proof: It is obvious that the Cauchy criterion is necessary for integrability.
To show that it is sufficient, consider the sequence m = m−1 and compat-
ible sequence of gauges δm (x) determined by condition (3.4). We can force


the constraints δm (x) ≤ δm−1 (x) to hold by inductively replacing δm (x) by
min{δm−1 (x), δm (x)} whenever needed. Now select a δm -fine partition πm
for each m. Because the gauge sequence δm (x) is decreasing, every partition
π that is δm -fine is also δm−1 -fine. Hence, the sequence of Riemann sums
S(f, πm ) is Cauchy and has a limit I satisfying S(f, πm ) − I ≤ m−1 .
Finally, given the potential integral I, we take an arbitrary > 0 and
choose m so that m−1 < . If π is δm -fine, then the inequality

S(f, π) − I ≤ S(f, π) − S(f, πm ) + S(f, πm ) − I < 2 .

completes the proof.
For two integrable functions f (x) and g(x), the gauge integral inherits
the linearity property
b b b
[αf (x) + βg(x)] dx = α f (x) dx + β g(x) dx
a a a

from its approximating Riemann sums. To prove this fact, take > 0 and
choose gauges δf (x) and δg (x) so that
b b
S(f, πf ) − f (x) dx < , S(f, πg ) − g(x) dx <
a a

whenever πf is δf -fine and πg is δg -fine. If the tagged partition π is δ-fine
for the gauge δ(x) = min{δf (x), δg (x)}, then
b b
S(αf + βg, π) − α f (x) dx − β g(x) dx
a a
b b
≤ |α| S(f, π) − f (x) dx + |β| S(g, π) − g(x) dx
a a
≤ (|α| + |β|) .

The gauge integral also inherits obvious order properties. For example,
b
a f (x) dx ≥ 0 whenever the integrand f (x) ≥ 0 for all x ∈ [a, b]. In this
b
case, the inequality |S(f, π) − a f (x) dx| < implies
b
0 ≤ S(f, π) ≤ f (x) dx + .
a

Since can be made arbitrarily small for f (x) integrable, it follows that
b
a
f (x) dx ≥ 0. This nonnegativity property translates into the
order property
b b
f (x) dx ≤ g(x) dx
a a


for two integrable functions f (x) ≤ g(x). In particular, when both f (x)
and |f (x)| are both integrable, we have
b b
f (x) dx ≤ |f (x)| dx .
a a

For vector-valued functions, the analogous rule
b b
f (x) dx ≤ f (x) dx (3.5)
a a

is also inherited from the approximating Riemann sums. The reader can
easily supply the proof using the triangle inequality of the Euclidean norm.
It does not take much imagination to extend the definition of the gauge
integral to matrix-valued functions, and inequality (3.5) applies in this
setting as well.
One of the nicest features of the gauge integral is that one can perturb
an integrable function at a countable number of points without changing
the value of its integral. This property fails for the Riemann integral but is
exhibited by the Lebesgue integral. To validate the property, it suffices to
prove that a function that equals 0 except at a countable number of points
has integral 0. Suppose f (x) is such a function with exceptional points
x1 , x2 , . . . and corresponding exceptional values f 1 , f 2 , . . .. We now define
a gauge δ(x) with value 1 on the nonexceptional points and values

δ(xj ) =
2j+2 [ f j + 1]
at the exceptional points. If π is a δ-fine partition, then xj can serve as
a tag for at most two intervals [si , si+1 ] of π and each such interval has
length less than 2δ(xj ). It follows that
∞
2 1
S(f, π) ≤ 2 f (xj ) j+2 [ f
≤ =
j
2 j + 1] j=1
2j

b
and therefore that a f (x) dx = 0.
In practice, the interval additivity rule
c b c
f (x) dx = f (x) dx + f (x) dx (3.6)
a a b

is obviously desirable. There are three separate issues in proving it. First,
given the existence of the integral over [a, c], do the integrals over [a, b]
and [b, c] exist? Second, if the integrals over [a, b] and [b, c] exist, does
the integral over [a, c] exist? Third, if the integrals over [a, b] and [b, c]
exist, are they additive? The first question is best approached through
Proposition 3.3.1. For > 0 there exists a gauge δ(x) such that
S(f, π1 ) − S(f, π2 ) <


for any two δ-fine partitions π1 and π2 of [a, c]. Given δ(x), take any two
δ-fine partitions γ1 and γ2 of [a, b] and a single δ-fine partition ω of [b, c].
The concatenated partitions γ1 ∪ ω and γ2 ∪ ω are δ-fine throughout [a, c]
and satisfy

S(f, γ1 ) − S(f, γ2 ) = S(f, γ1 ∪ ω) − S(f, γ2 ∪ ω) < .

According to the Cauchy criterion, the integral over [a, b] therefore exists.
A similar argument implies that the integral over [b, c] also exists. Finally,
the combination of these results shows that the integral exists over any
interval [u, v] contained within [a, b].
For the converse, choose gauges δ1 (x) on [a, b] and δ2 (x) on [b, c] so that
b c
S(f, γ) − f (x) dx < , S(f, ω) − f (x) dx <
a b

for any δ1 -fine partition γ of [a, b] and any δ2 -fine partition ω of [b, c]. The
concatenated partition π = γ ∪ ω satisfies
b c
S(f, π) − f (x) dx − f (x) dx
a b
b c
≤ S(f, γ) − f (x) + S(f, ω) − f (x) dx (3.7)
a b
< 2

because the Riemann sums satisfy S(f, π) = S(f, γ)+S(f, ω). This suggests
defining a gauge δ(x) equal to δ1 (x) on [a, b] and equal to δ2 (x) on [b, c].
The problem with this tactic is that some partitions of [a, c] do not split
at b. However, we can ensure a split by redefining δ(x) by

˜ min{δ1 (b), δ2 (b)} x=b
δ(x) =
min{δ(x), 1 |x − b|}
2 x = b.

This forces b to be the tag of its assigned interval, and we can if needed
split this interval at b and retain b as tag of both subintervals. With δ(x)
amended in this fashion, any δ-fine partition π can be viewed as a con-
catenated partition γ ∪ ω splitting at b. As such π obeys inequality (3.7).
This argument simultaneously proves that the integral over [a, c] exists and
satisfies the additivity property (3.6)
If the function f (x) is vector-valued with n components, then the in-
tegrability of f (x) should imply the integrability of each its components
fi (x). Furthermore, we should be able to write
⎛ b ⎞
b
f (x) dx
a 1
⎜ . ⎟
f (x) dx = ⎝ .
. ⎠.
a b
a fn (x) dx


Conversely, if its components are integrable, then f (x) should be integrable
as well. The inequalities
n
√
S(f, π) − I ≤ |S(fi , π) − Ii | ≤ n S(f, π) − I .
i=1

based on Example 2.5.6 and Problem 3 of Chap. 2 are instrumental in
proving this logical equivalence. Given that we can integrate component
by component, for the remainder of this chapter we will deal exclusively
with real-valued functions.
We have not actually shown that any function is integrable. The most
obvious possibility is a constant. Fortunately, it is trivial to demonstrate
that
b
c dx = c(b − a).
a

Step functions are one rung up the hierarchy of functions. If

n−1
f (x) = ci 1(si ,si+1 ] (x)
i=0

for a = s0 < s1 < · · · < sn = b, then our nascent theory allows us to
evaluate
b n−1 si+1 n−1
f (x) dx = ci dx = ci (si+1 − si ).
a i=0 si i=0

This fact and the next technical proposition turn out to be the key to
showing that continuous functions are integrable.

Proposition 3.3.2 Let f (x) be a function with domain [a, b]. Suppose for
every > 0 there exist two integrable functions g(x) and h(x) satisfying
g(x) ≤ f (x) ≤ h(x) for all x and

b b
h(x) dx ≤ g(x) dx + .
a a

Then f (x) is integrable.

Proof: For > 0, choose gauges δg (x) and δh (x) on [a, b] so that

b b
S(g, πg ) − g(x) dx < , S(h, πh ) − h(x) dx <
a a


for any δg -fine partition πg and any δh -fine partition πh . If π is a δ-fine
partition for δ(x) = min{δg (x), δh (x)}, then the inequalities
b
g(x) dx − < S(g, π)
a
≤ S(f, π)
≤ S(h, π)
b
< h(x) +
a
b
≤ g(x) dx + 2
a

trap S(f, π) in an interval of length 3 . Because the Riemann sum S(f, γ)
for any other δ-fine partition γ is trapped in the same interval, the integral
of f (x) exists by the Cauchy criterion.

Proposition 3.3.3 Every continuous function f (x) on [a, b] is integrable.
Proof: In view of the uniform continuity of f (x) on [a, b], for every > 0
there exists a δ > 0 with |f (x) − f (y)| < when |x − y| < δ. For the
constant gauge δ(x) = δ and a corresponding δ-fine partition π with mesh
points s0 , . . . , sn , let mi be the minimum and Mi be the maximum of f (x)
on [si , si+1 ]. The step functions
n n
g(x) = mi 1(si ,si+1 ] (x), h(x) = Mi 1(si ,si+1 ] (x)
i=1 i=1

then satisfy g(x) ≤ f (x) ≤ h(x) except at the single point a. Furthermore,
b b n
h(x) dx − g(x) dx ≤ (si+1 − si )
a a i=1
= (b − a).
Application of Proposition 3.3.2 now completes the proof.

3.4 The Fundamental Theorem of Calculus
The fundamental theorem of calculus divides naturally into two parts. For
the gauge integral, the first and more difficult part is easily proved by
invoking what is called the straddle inequality. Let f (x) be differentiable
at the point t ∈ [a, b]. Then there exists δ(t) > 0 such that
f (x) − f (t)
− f (t) <
x−t

3.4 The Fundamental Theorem of Calculus 63

for all x ∈ [a, b] with |x − t| < δ(t). If u < t < v are two points straddling t
and located in [a, b] ∩ (t − δ(t), t + δ(t)), then
|f (v) − f (u) − f (t)(v − u)| ≤ |f (v) − f (t) − f (t)(v − t)|
+ |f (t) − f (u) − f (t)(t − u)|
≤ (v − t) + (t − u) (3.8)
= (v − u).
Inequality (3.8) also clearly holds when either u = t or v = t.
Proposition 3.4.1 (Fundamental Theorem I) If f (x) is differentiable
throughout [a, b], then
b
f (x) dx = f (b) − f (a).
a

Proof: Using the gauge δ(t) figuring in the straddle inequality (3.8), select
a δ-fine partition π with mesh points a = s0 < s1 < · · · < sn = b and tags
ti ∈ [si , si+1 ]. Application of the inequality and telescoping yield
n−1
|f (b) − f (a) − S(f , π)| = [f (si+1 ) − f (si ) − f (ti )(si+1 − si )]
i=0
n−1
≤ |f (si+1 ) − f (si ) − f (ti )(si+1 − si )|
i=0
n−1
≤ (si+1 − si )
i=0
= (b − a).
This demonstrates that f (x) has integral f (b) − f (a).
The first half of the fundamental theorem remains valid for a continuous
function f (x) that is differentiable except on a countable set N [250]. Since
changing an integrand at a countable number of points does not alter its
integral, it suffices to prove that
b
0 t∈N
f (b) − f (a) = g(t) dt, where g(t) =
a f (t) t ∈ N.
Suppose > 0 is given. For t ∈ N define the gauge value δ(t) to satisfy
the straddle inequality. Enumerate the points tj of N, and define δ(tj ) > 0
so that |f (tj ) − f (tj + s)| < 2−j−2 whenever |s| < δ(tj ). Now select a
δ-fine partition π with mesh points a = s0 < s1 < · · · < sn = b and tags
ri ∈ [si , si+1 ]. Break the sum
n−1
f (b) − f (a) − S(g, π) = f (si+1 ) − f (si ) − g(ri )(si+1 − si )
i=0


into two parts. Let S denote the sum of the terms with tags ri ∈ N, and
let S denote the sum of the terms with tags ri ∈ N . As noted earlier,
|S | ≤ (b − a). Because a tag is attached to at most two subintervals, the
second sum satisfies

|S | ≤ |f (si+1 ) − f (si )|
ri ∈N

≤ |f (si+1 ) − f (ri )| + |f (ri ) − f (si )|
ri ∈N
∞
≤ 2 22−j−2 = .
j=1

It follows that |S + S | ≤ (b − a + 1) and therefore that the stated integral
exists and equals f (b) − f (a).
In demonstrating the second half of the fundamental theorem, we will
implicitly use the standard convention
c d
f (x) dx = − f (x) dx
d c

for c < d. This convention will also be in force in proving the substitution
formula.
Proposition 3.4.2 (Fundamental Theorem II) If a function f (x) is
integrable on [a, b], then its indefinite integral
t
F (t) = f (x) dx
a

has derivative F (t) = f (t) at any point t where f (x) is continuous. The
derivative is taken as one sided if t = a or t = b.
Proof: In deriving the interval additivity rule (3.6), we showed that the
integral F (t) exists. At a point t where f (x) is continuous, for any > 0
there is a δ > 0 such that − < f (x) − f (t) < when |x − t| < δ and
x ∈ [a, b]. Hence, the difference

F (t + s) − F (t) 1 t+s
− f (t) = [f (x) − f (t)] dx
s s t

is less than and greater than − for |s| < δ. In the limit as s tends to 0,
we recover F (t) = f (t).
The fundamental theorem of calculus has several important corollaries.
These are covered in the next three propositions on the substitution rule,
integration by parts, and finite Taylor expansions.

3.4 The Fundamental Theorem of Calculus 65

Proposition 3.4.3 (Substitution Rule) Suppose f (x) is differentiable
on [a, b], g(x) is differentiable on [c, d], and the image of [c, d] under g(x)
is contained within [a, b]. Then
g(d) d
f (y) dy = f [g(x)]g (x) dx.
g(c) c

Proof: Part I of the fundamental theorem and the chain rule identity

{f [g(x)]} = f [g(x)]g (x)

imply that both integrals have value f [g(d)] − f [g(c)].

Proposition 3.4.4 (Integration by Parts) Suppose f (x) and g(x) are
differentiable on [a, b]. Then f (x)g(x) is integrable on [a, b] if and only if
f (x)g (x) is integrable on [a, b]. Furthermore, the two integrals are related
by the identity
b b
f (x)g(x) dx + f (x)g (x) dx = f (b)g(b) − f (a)g(a),
a a

Proof: The product rule for derivatives is

[f (x)g(x)] = f (x)g(x) + f (x)g (x).

If two of three members of this identity are integrable, then the third is as
well. Since part I of the fundamental theorem entails
b
[f (x)g(x)] dx = f (b)g(b) − f (a)g(a),
a

the proposition follows.
The derivative of a function may itself be differentiable. Indeed, it makes
sense to speak of the kth-order derivative of a function f (x) if f (x) is
sufficiently smooth. Traditionally, the second-order derivative is denoted
f (x) and an arbitrary kth-order derivative by f (k) (x). We can use these
extra derivatives to good effect in approximating f (x) locally. The next
proposition makes this clear and offers an explicit estimate of the error in
a finite Taylor expansion of f (x).
Proposition 3.4.5 (Taylor Expansion) Suppose f (x) has a derivative
of order k + 1 on an open interval around the point y. Then for all x in the
interval, we have
k
1 (j)
f (x) = f (y) + f (y)(x − y)j + Rk (x), (3.9)
j=1
j!


where the remainder
1
(x − y)k+1
Rk (x) = f (k+1) [y + t(x − y)](1 − t)k dt.
k! 0

If |f (k+1) (z)| ≤ b for all z between x and y, then

b|x − y|k+1
|Rk (x)| ≤ . (3.10)
(k + 1)!

Proof: When k = 0, the Taylor expansion (3.9) reads
1
f (x) = f (y) + (x − y) f [y + t(x − y)]dt
0

and follows from the fundamental theorem of calculus and the chain rule.
Induction and the integration-by-parts formula
1
f (k) [y + t(x − y)](1 − t)k−1 dt
0
1 1
= − f (k) [y + t(x − y)](1 − t)k
k 0
x − y 1 (k+1)
+ f [y + t(x − y)](1 − t)k dt
k 0
1
1 (k) x−y
= f (y) + f (k+1) [y + t(x − y)](1 − t)k dt
k k 0

now validate the general expansion (3.9). The error estimate follows directly
from the bound |f (k+1) (z)| ≤ b and the integral
1
1
(1 − t)k dt = .
0 k+1

3.5 More Advanced Topics in Integration
Within the confines of a single chapter, it is impossible to develop rigorously
all of the properties of the gauge integral. In this section we will discuss
briefly four topics: (a) integrals over unbounded intervals, (b) improper
integrals and Hake’s theorem, (c) the interchange of limits and integrals,
and (d) multidimensional integrals and Fubini’s theorem.
Defining the integral of a function over an unbounded interval requires
several minor adjustments. First, the real line is extended to include the
points ±∞. Second, a gauge function δ(x) is now viewed as mapping x
to an open interval containing x. The associated interval may be infinite;
indeed, it must be infinite if x equals ±∞. In a δ-fine partition π, the

3.5 More Advanced Topics in Integration 67

interval Ij containing the tag xj is contained in δ(xj ). The length of an
infinite interval Ij is defined to be 0 in an approximating Riemann sum
S(f, π) to avoid infinite contributions to the sum. Likewise, the integrand
f (x) is assigned the value 0 at x = ±∞.
This extended definition carries with it all the properties we expect. Its
most remarkable consequence is that it obliterates the distinction between
proper and improper integrals. Hake’s theorem provides the link. If we
allow a and b to be infinite as well as finite, then Hake’s theorem says a
function f (x) is integrable over (a, b) if and only if either of the two limits
b c
lim f (x) dx or lim f (x) dx
c→a c c→b a

b
exists. If either limit exists, then a
f (x) dx equals that limit. For instance,
the integral
∞ c c
1 1 1
dx = lim dx = lim − = 1
1 x2 c→∞ 1 x2 c→∞ x 1

exists and has the indicated limit by this reasoning.
∞
Example 3.5.1 Existence of 0 sinc(x) dx
Consider the integral of sinc(x) = sin(x)/x over the interval (0, ∞). Because
sinc(x) is continuous throughout [0, 1] with limit 1 as x approaches 0, the
integral over [0, 1] is defined. Hake’s theorem and integration by parts show
that the integral
∞ c
sin x sin x
dx = lim dx
1 x c→∞ 1 x
c
cos x c cos x
= lim − − dx
c→∞ x 1 1 x2
∞
cos x
= cos 1 − dx
1 x2

exists provided the integral of x−2 cos x exists over (1, ∞). We will demon-
strate this fact in a moment. If we accept it, then it is clear that the integral
of sinc(x) over (0, ∞) exists as well. As we shall find in Example 3.5.4, this
integral equals π/2. In contrast to these positive results, sinc(x) is not
absolutely integrable over (0, ∞). Finally, we note in passing that the sub-
stitution rule gives
∞ ∞ ∞
sin cx sin y −1 sin y π
dx = c dy = dy = .
0 x 0 c−1 y 0 y 2

for any c > 0.


We now ask under what circumstances the formula
b b
lim fn (x) dx = lim fn (x) dx (3.11)
n→∞ a a n→∞
is valid. The two relevant theorems permitting the interchange of limits
and integrals are the monotone convergence theorem and the dominated
convergence theorem. In the monotone convergence theorem, we are given
an increasing sequence fn (x) of integrable functions that converge to a
ﬁnite limit for each x. Formula (3.11) is true in this setting provided
b
sup fn (x) dx < ∞.
n a

In the dominated convergence theorem, we assume the sequence fn (x) is
trapped between two integrable functions g(x) and h(x) in the sense that
g(x) ≤ fn (x) ≤ h(x)
for all n and x. If limn→∞ fn (x) exists in this setting, then the inter-
change (3.11) is allowed. The choices
fn (x) = 1[1,n] (x)x−2 cos x , g(x) = −x−2 , h(x) = x−2
in the dominated convergence theorem validate the existence of
∞ n
x−2 cos x dx = lim x−2 cos x dx.
1 n→∞ 1
We now consider two more substantive applications of the monotone and
dominated convergence theorems.
Example 3.5.2 Johann Bernoulli’s Integral
As example of delicate maneuvers in integration, consider the integral
1 1
1
dx = e−x ln x dx
0 xx 0
1 ∞
(−x ln x)n
= dx
0 n=0
n!
∞ 1
1
= (−x ln x)n dx .
n=0
n! 0

The reader will notice the application of the monotone convergence theorem
in passing from the second to the third line above. Further progress can be
made by applying the integration by parts result
1 1
n lnn−1 x
xm lnn x dx = − xm+1 dx
0 m+1 0 x
1
n
= − xm lnn−1 x dx
m+1 0

3.5 More Advanced Topics in Integration 69

recursively to evaluate
1 1
n! n!
(−x ln x)n dx = xn dx = .
0 (n + 1)n 0 (n + 1)n+1
The pleasant surprise
1 ∞
1 1
dx =
0 xx n=0
(n + 1)n+1

emerges.
Example 3.5.3 Competing Deﬁnitions of the Gamma Function
The dominated convergence theorem allows us to derive Gauss’s represen-
tation
n!nz
Γ(z) = lim
n→∞ z(z + 1) · · · (z + n)

of the gamma function from Euler’s representation
∞
Γ(z) = xz−1 e−x dx .
0

As students of statistics are apt to know from their exposure to the beta
distribution, repeated integration by parts and the fundamental theorem
of calculus show that
1
n!
xz−1 (1 − x)n dx = .
0 z(z + 1) · · · (z + n)
The substitution rule yields
1 n n
y
nz xz−1 (1 − x)n dx = y z−1 1 − dy .
0 0 n
Thus, it suﬃces to prove that
∞ n n
y
xz−1 e−x dx = lim y z−1 1 − dy .
0 n→∞ 0 n
Given the limit
y n
lim 1− = e−y ,
n→∞ n
we need an integrable function h(y) that dominates the nonnegative se-
quence
y n
fn (y) = 1[0,n] (y)y z−1 1 −
n


from above in order to apply the dominated convergence theorem. In light
of the inequality
y n
1− ≤ e−y ,
n
the function h(y) = y z−1 e−y will serve.
Finally, the gauge integral extends to multiple dimensions, where a ver-
sion of Fubini’s theorem holds for evaluating multidimensional integrals
via iterated integrals [278]. Consider a function f (x, y) deﬁned over the
Cartesian product H × K of two multidimensional intervals H and K. The
intervals in question can be bounded or unbounded. If f (x, y) is integrable
over H × K, then Fubini’s theorem asserts that the integrals H f (x, y) dx
and K f (x, y) dy exist and can be integrated over the remaining variable
to give the full integral. In symbols,

f (x, y) dx dy = f (x, y) dx dy = f (x, y) dy dx .
H×K K J J K
Conversely, if either iterated integral exists, one would like to conclude that
the full integral exists as well. This is true whenever f (x, y) is nonnega-
tive. Unfortunately, it is false in general, and two additional hypotheses
introduced by Tonelli are needed to rescue the situation. One hypothesis
is that f (x, y) is measurable. Measurability is a technical condition that
holds except for very pathological functions. The other hypothesis is that
|f (x, y)| ≤ g(x, y) for some nonnegative function g(x, y) for which the
iterated integral exists. This domination condition is shared with the dom-
inated convergence theorem and forces f (x, y) to be absolutely integrable.
∞
Example 3.5.4 Evaluation of 0
sinc(x) dx
According to Fubini’s theorem
n nπ nπ n
e−xy sin x dx dy = e−xy sin x dy dx . (3.12)
0 0 0 0
The second of these iterated integrals
nπ n nπ nπ
sin x sin x
e−xy sin x dy dx = dx − e−nx dx
0 0 0 x 0 x
∞
tends to 0
as n tends to ∞ by a combination of Hake’s theorem
sinc(x) dx
and the dominated convergence theorem. The inner integral of the left
iterated integral in (3.12) equals
nπ nπ nπ
e−xy sin x dx = −e−xy cos x − ye−xy sin x
0 0 0
nπ
− y2 e−xy sin x dx
0
nπ
−nπy
= 1−e cos nπ − y 2 e−xy sin x dx
0

3.6 Problems 71

after two integrations by parts. It follows that
nπ
1 − e−nπy cos nπ
e−xy sin x dx = .
0 1 + y2

Finally, application of the dominated convergence theorem gives
∞
n
1 − e−nπy cos nπ 1
lim dy = dy
n→∞ 0 1 + y2 0 1 + y2
π
= .
2
Equating the limits of the right and left hand sides of the identity (3.12)
∞
therefore [278] yields the value of π/2 for 0 sinc(x) dx.

3.6 Problems
1. Give an alternative proof of Cousin’s lemma by letting y be the supre-
mum of the set of x ∈ [a, b] such that [a, x] possesses a δ-fine partition.

2. Use Cousin’s lemma to prove that a continuous function f (x) defined
on an interval [a, b] is uniformly continuous there [108]. (Hint: Given
> 0 define a gauge δ(x) by the requirement that |f (y) − f (x)| < 1 2
for all y ∈ [a, b] with |y − x| < 2δ(x).)

3. A possibly discontinuous function f (x) has one-sided limits at each
point x ∈ [a, b]. Show by Cousin’s lemma that f (x) is bounded on
[a, b].

4. Suppose f (x) has a nonnegative derivative f (x) throughout [a, b].
Prove that f (x) is nondecreasing on [a, b]. Also prove that f (x) is
constant on [a, b] if and only if f (x) = 0 for all x. (Hint: These yield
easily to the fundamental theorem of calculus. Alternatively for the
first assertion, consider the function

f (x) = f (x) + x

for > 0.)

5. Using only the definition of the gauge integral, demonstrate that
b −a
f (t) dt = f (−t) dt
a −b

when either integral exists.


6. Based on the standard definition of the natural logarithm
y
1
ln y = dx,
1 x
prove that ln yz = ln y + ln z for all positive arguments y and z. Use
this property to verify that ln y −1 = − ln y and that ln y r = r ln y for
every rational number r.
7. Apply Proposition 3.3.2 and demonstrate that every monotonic func-
tion defined on an interval [a, b] is integrable on that interval.
8. Let f (x) be a continuous real-valued function on [a, b]. Show that
there exists c ∈ [a, b] with
b
f (x) dx = f (c)(b − a).
a

9. In the Taylor expansion of Proposition 3.4.5, suppose f (k+1) (x) is
continuous. Show that we can replace the remainder by
(x − y)k+1 (k+1)
Rk (x) = f (z)
(k + 1)!
for some z between x and y.
10. Suppose that f (x) is infinitely differentiable and that c and r are pos-
itive numbers. If |f (k) (x)| ≤ ck!rk for all x near y and all nonnegative
integers k, then use Proposition 3.4.5 to show that
∞
f (k) (y)
f (x) = (x − y)k
k!
k=0

near y. Explicitly determine the infinite Taylor series expansion of the
function f (x) = (1 + x)−1 around x = 0 and justify its convergence.
11. Suppose the nonnegative continuous function f (x) satisfies
b
f (x) dx = 0.
a

Prove that f (x) is identically 0 on [a, b].
12. Consider the function

f (x) = x2 sin (x−2 ) x = 0
0 x = 0.
1 1
Show that 0 f (x) dx = sin (1) and limt↓0 t |f (x)| dx = ∞. Hence,
f (x) is integrable but not absolutely integrable on [0,1].

3.6 Problems 73

13. Prove that
∞
β 1 α+1
xα e−x dx = Γ
0 β β

for α and β positive [82].

14. Justify the formula
1 ∞
ln(1 − x) 1
= − .
0 x n=1
n2

15. Show that
∞
xz−1
dx = ζ(z)Γ(z),
0 ex − 1
∞
where ζ(z) = n=1 n−z .

16. Prove that the functions
∞
sin t
f (x) = dt
1 x2 + t2
∞
g(x) = e−xt cos t dt, x > 0,
0

are continuous.

17. Let fn (x) be a sequence of integrable functions on [a, b] that converges
uniformly to f (x). Demonstrate that f (x) is integrable and satisﬁes
b b
lim fn (x) dx = f (x) dx .
n→∞ a a

(Hints: For > 0 small take n large enough so that

fn (x) − ≤ f (x) ≤ fn (x) +
2(b − a) 2(b − a)

for all x.)

18. Let p and q be positive integers. Justify the series expansion
1 ∞
xp−1 (−1)n
dx =
0 1 + xq n=0
p + nq

by the monotone convergence theorem. Be careful since the series
does not converge absolutely [278].


19. Suppose f (x) is a continuous function on R. Demonstrate that the
sequence
n−1
1 k
fn (x) = f x+
n n
k=0

converges uniformly to a continuous function on every ﬁnite interval
[a, b] [69].
20. Prove that
1
xb − xa b+1
dx = ln
0 ln x a+1

for 0 < a < b [278] by showing that both sides equal the double
integral

xy dx dy .
[0,1]×[a,b]

21. Integrate the function

y 2 − x2
f (x, y) =
(x2 + y 2 )2

over the unit square [0, 1]× [0, 1]. Show that the two iterated integrals
disagree, and explain why Fubini’s theorem fails.
2 2
22. Suppose the two partial derivatives ∂x∂∂x2 f (x) and ∂x∂∂x1 f (x) exist
1 2
and are continuous in a neighborhood of a point y ∈ R2 . Show that
they are equal at the point. (Hints: If they are not equal, take a small
box around the point where their diﬀerence has constant sign. Now
apply Fubini’s theorem.)
23. Demonstrate that
∞ √
2 π
e−x dx =
0 2
2 2
by evaluating the integral of f (y) = y2 e−(1+y1 )y2 over the rectangle
(0, ∞) × (0, ∞).

Optimization

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (8)

Similar to Optimization

Similar to Optimization (20)

More from Springer

More from Springer (20)

Optimization