Upcoming SlideShare
×

# Optimization

175
-1

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total Views
175
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Optimization

1. 1. 3The Gauge Integral3.1 IntroductionMuch of calculus deals with the interplay between diﬀerentiation andintegration. The antiquated term “antidiﬀerentiation” emphasizes the factthat diﬀerentiation and integration are inverses of one another. We willtake it for granted that readers are acquainted with the mechanics of in-tegration. The current chapter develops just enough integration theory tomake our development of diﬀerentiation in Chap. 4 and the calculus ofvariations in Chap. 17 respectable. It is only fair to warn readers that inother chapters a few applications to probability and statistics will assumefamiliarity with properties of the expectation operator not covered here. The ﬁrst successful eﬀort to put integration on a rigorous basis was un-dertaken by Riemann. In the early twentieth century, Lebesgue deﬁneda more sophisticated integral that addresses many of the limitations ofthe Riemann integral. However, even Lebesgue’s integral has its defects.In the past few decades, mathematicians such as Henstock and Kurzweilhave expanded the deﬁnition of integration on the real line to includea wider variety of functions. The new integral emerging from these in-vestigations is called the gauge integral or generalized Riemann integral[7, 68, 108, 193, 250, 255, 278]. The gauge integral subsumes the Riemannintegral, the Lebesgue integral, and the improper integrals met in tradi-tional advanced calculus courses. In contrast to the Lebesgue integral, theintegrands of the gauge integral are not necessarily absolutely integrable.K. Lange, Optimization, Springer Texts in Statistics 95, 53DOI 10.1007/978-1-4614-5838-8 3,© Springer Science+Business Media New York 2013
2. 2. 54 3. The Gauge Integral It would take us too far aﬁeld to develop the gauge integral in fullgenerality. Here we will rest content with proving some of its elementaryproperties. One of the advantages of the gauge integral is that many theo-rems hold with fewer qualiﬁcations. The fundamental theorem of calculus isa case in point. The commonly stated version of the fundamental theoremconcerns a diﬀerentiable function f (x) on an interval [a, b]. As all studentsof calculus know, b f (x) dx = f (b) − f (a). aAlthough this version is true for the gauge integral, it does not hold for theLebesgue integral because the mere fact that f (x) exists throughout [a, b]does not guarantee that it is Lebesgue integrable. This quick description of the gauge integral is not intended to imply thatthe gauge integral is uniformly superior to the Lebesgue integral and itsextensions. Certainly, probability theory would be severely handicappedwithout the full ﬂexibility of modern measure theory. Furthermore, the ad-vanced theory of the gauge integral is every bit as diﬃcult as the advancedtheory of the Lebesgue integral. For pedagogical purposes, however, one canargue that a student’s ﬁrst exposure to the theory of integration should fea-ture the gauge integral. As we shall see, many of the basic properties ofthe gauge integral ﬂow directly from its deﬁnition. As an added dividend,gauge functions provide an alternative approach to some of the material ofChap. 2.3.2 Gauge Functions and δ-Fine PartitionsThe gauge integral is deﬁned through gauge functions. A gauge functionis nothing more than a positive function δ(t) deﬁned on a ﬁnite interval[a, b]. In approximating the integral of a function f (t) over [a, b] by a ﬁniteRiemann sum, it is important to sample the function most heavily in thoseregions where it changes most rapidly. Now by a Riemann sum we mean asum n−1 S(f, π) = f (ti )(si+1 − si ), i=0where the mesh points a = s0 < s1 < · · · < sn = b form a partition π of[a, b], and the tags ti are chosen so that ti ∈ [si , si+1 ]. If δ(ti ) measures therapidity of change of f (t) near ti , then it makes sense to take δ(t) small inregions of rapid change and to force si and si+1 to belong to the interval(ti − δ(ti ), ti + δ(ti )). A tagged partition with this property is called a δ-ﬁne partition. Our ﬁrst proposition relieves our worry that δ-ﬁne partitionsexist.
3. 3. 3.2 Gauge Functions and δ-Fine Partitions 55Proposition 3.2.1 (Cousin’s Lemma) For every gauge δ(t) on a ﬁniteinterval [a, b] there is a δ-ﬁne partition.Proof: Assume that [a, b] lacks a δ-ﬁne partition. Since we can construct aδ-ﬁne partition of [a, b] by appending a δ-ﬁne partition of the half-interval[(a + b)/2, b] to a δ-ﬁne partition of the half-interval [a, (a + b)/2], it fol-lows that either [a, (a + b)/2] or [(a + b)/2, b] lacks a δ-ﬁne partition. As inExample 2.3.1, we choose one of the half-intervals based on this failure andcontinue bisecting. This creates a nested sequence of intervals [ai , bi ] con-verging to a point x. If i is large enough, then [ai , bi ] ⊂ (x − δ(x), x + δ(x)),and the interval [ai , bi ] with tag x is a δ-ﬁne partition of itself. This con-tradicts the choice of [ai , bi ] and the assumption that the original interval[a, b] lacks a δ-ﬁne partition. Before launching into our treatment of the gauge integral, we pause togain some facility with gauge functions [108]. Here are three examples thatillustrate their value.Example 3.2.1 A Gauge Proof of Weierstrass’ TheoremConsider a real-valued continuous function f (t) with domain [a, b]. Supposethat f (t) does not attain its supremum on [a, b]. Then for each t there existsa point x ∈ [a, b] with f (t) < f (x). By continuity there exists δ(t) > 0 suchthat f (y) < f (x) for all y ∈ [a, b] with |y − t| < δ(t). Using δ(t) as agauge, select a δ-ﬁne partition a = s0 < s1 < · · · < sn = b with tagsti ∈ [si , si+1 ] and designated points xi satisfying f (ti ) < f (xi ). Let xmaxbe the point xi having the largest value f (xi ). Because xmax lies in someinterval [si , si+1 ], we have f (xmax ) < f (xi ). This contradiction discreditsour assumption that f (x) does not attain its supremum. A similar argumentapplies to the inﬁmum.Example 3.2.2 A Gauge Proof of the Heine-Borel TheoremOne can use Cousin’s lemma to prove the Heine-Borel Theorem on the realline [278]. This theorem states that if C is a compact set contained in theunion ∪α Oα of a collection of open sets Oα , then C is actually contained inthe union of a ﬁnite number of the Oα . Suppose C ⊂ [a, b]. Deﬁne a gaugeδ(t) so that the interval (t − δ(t), t + δ(t)) does not intersect C when t ∈ Cand (t − δ(t), t + δ(t)) is contained in some Oα when t ∈ C. Based on δ(t),select a δ-ﬁne partition a = s0 < s1 < · · · < sn = b with tags ti ∈ [si , si+1 ].By deﬁnition C is contained in the union ∪ti ∈C Ui , where Ui is the set Oαcovering ti . The Heine-Borel theorem extends to compact sets in Rn .Example 3.2.3 A Gauge Proof of the Intermediate Value TheoremUnder the assumption of the previous example, let c be an number strictlybetween f (a) and f (b). If we assume that there is no t ∈ [a, b] with f (t) = c,then there exists a positive number δ(t) such that either f (x) < c for all
4. 4. 56 3. The Gauge Integralx ∈ [a, b] with |x − t| < δ(t) or f (x) > c for all x ∈ [a, b] with |x − t| < δ(t).We now select a δ-ﬁne partition a = s0 < s1 < · · · < sn = b and observethat throughout each interval [si , si+1 ] either f (t) < c or f (t) > c. If tostart f (s0 ) = f (a) < c, then f (s1 ) < c, which implies f (s2 ) < c and soforth until we get to f (sn ) = f (b) < c. This contradicts the assumptionthat c lies strictly between f (a) and f (b). With minor diﬀerences, the sameproof works when f (a) > c. In preparation for our next example and for the fundamental theoremof calculus later in this chapter, we must deﬁne derivatives. A real-valuedfunction f (t) deﬁned on an interval [a, b] possesses a derivative f (c) atc ∈ [a, b] provided the limit f (t) − f (c) lim = f (c) (3.1) t→c t−cexists. At the endpoints a and b, the limit is necessarily one sided. Tak-ing a sequential view of convergence, deﬁnition (3.1) means that for everysequence tm converging to c we must have f (tm ) − f (c) lim = f (c). m→∞ tm − cIn calculus, we learn the following rules for computing derivatives:Proposition 3.2.2 If f (t) and g(t) are diﬀerentiable functions on (a, b),then αf (t) + βg(t) = αf (t) + βg (t) f (t)g(t) = f (t)g(t) + f (t)g (t) 1 f (t) = − . f (t) f (t)2In the third formula we must assume f (t) = 0. Finally, if g(t) maps intothe domain of f (t), then the functional composition f ◦ g(t) has derivative [f ◦ g(t)] = f ◦ g(t)g (t).Proof: We will prove the above sum, product, quotient, and chain rules ina broader context in Chap. 4. Our proofs will not rely on integration.Example 3.2.4 Strictly Increasing FunctionsLet f (t) be a diﬀerentiable function on [c, d] with strictly positive derivative.We now show that f (t) is strictly increasing. For each t ∈ [c, d] there existsδ(t) > 0 such that f (x) − f (t) > 0 (3.2) x−t
5. 5. 3.3 Deﬁnition and Basic Properties of the Integral 57for all x ∈ [a, b] with |x − t| < δ(t). According to Proposition 3.2.1, for anytwo points a < b from [c, d], there exists a δ-ﬁne partition a = s0 < s1 < · · · < sn = bof [a, b] with tags ti ∈ [si , si+1 ]. In view of inequality (3.2), at least oneof the two inequalities f (si ) ≤ f (ti ) ≤ f (si+1 ) must be strict. Thus, thetelescoping sum n−1 f (b) − f (a) = [f (si+1 ) − f (si )] i=0must be positive.3.3 Deﬁnition and Basic Properties of the IntegralWith later applications in mind, it will be convenient to deﬁne the gaugeintegral for vector-valued functions f (x) : [a, b] → Rn . In this context, f (x)is said to have integral I if for every > 0 there exists a gauge δ(x) on[a, b] such that S(f, π) − I < (3.3)for all δ-ﬁne partitions π. Our ﬁrst order of business is to check that theintegral is unique whenever it exists. Thus, suppose that the vector J is asecond possible value of the integral. Given > 0 choose gauges δI (x) andδJ (x) leading to inequality (3.3). The minimum δ(x) = min{δI (x), δJ (x)}is also a gauge, and any partition π that is δ-ﬁne is also δI and δJ -ﬁne.Hence, I −J ≤ I − S(f, π) + S(f, π) − J < 2 .Since is arbitrary, J = I. One can also deﬁne f (x) to be integrable if its Riemann sums are Cauchyin an appropriate sense.Proposition 3.3.1 (Cauchy criterion) A function f (x) : [a, b] → Rn isintegrable if and only if for every > 0 there exists a gauge δ(x) > 0 suchthat S(f, π1 ) − S(f, π2 ) < (3.4)for any two δ-ﬁne partitions π1 and π2 .Proof: It is obvious that the Cauchy criterion is necessary for integrability.To show that it is suﬃcient, consider the sequence m = m−1 and compat-ible sequence of gauges δm (x) determined by condition (3.4). We can force
6. 6. 58 3. The Gauge Integralthe constraints δm (x) ≤ δm−1 (x) to hold by inductively replacing δm (x) bymin{δm−1 (x), δm (x)} whenever needed. Now select a δm -ﬁne partition πmfor each m. Because the gauge sequence δm (x) is decreasing, every partitionπ that is δm -ﬁne is also δm−1 -ﬁne. Hence, the sequence of Riemann sumsS(f, πm ) is Cauchy and has a limit I satisfying S(f, πm ) − I ≤ m−1 .Finally, given the potential integral I, we take an arbitrary > 0 andchoose m so that m−1 < . If π is δm -ﬁne, then the inequality S(f, π) − I ≤ S(f, π) − S(f, πm ) + S(f, πm ) − I < 2 .completes the proof. For two integrable functions f (x) and g(x), the gauge integral inheritsthe linearity property b b b [αf (x) + βg(x)] dx = α f (x) dx + β g(x) dx a a afrom its approximating Riemann sums. To prove this fact, take > 0 andchoose gauges δf (x) and δg (x) so that b b S(f, πf ) − f (x) dx < , S(f, πg ) − g(x) dx < a awhenever πf is δf -ﬁne and πg is δg -ﬁne. If the tagged partition π is δ-ﬁnefor the gauge δ(x) = min{δf (x), δg (x)}, then b b S(αf + βg, π) − α f (x) dx − β g(x) dx a a b b ≤ |α| S(f, π) − f (x) dx + |β| S(g, π) − g(x) dx a a ≤ (|α| + |β|) . The gauge integral also inherits obvious order properties. For example, b a f (x) dx ≥ 0 whenever the integrand f (x) ≥ 0 for all x ∈ [a, b]. In this bcase, the inequality |S(f, π) − a f (x) dx| < implies b 0 ≤ S(f, π) ≤ f (x) dx + . aSince can be made arbitrarily small for f (x) integrable, it follows that b a f (x) dx ≥ 0. This nonnegativity property translates into theorder property b b f (x) dx ≤ g(x) dx a a
7. 7. 3.3 Deﬁnition and Basic Properties of the Integral 59for two integrable functions f (x) ≤ g(x). In particular, when both f (x)and |f (x)| are both integrable, we have b b f (x) dx ≤ |f (x)| dx . a aFor vector-valued functions, the analogous rule b b f (x) dx ≤ f (x) dx (3.5) a ais also inherited from the approximating Riemann sums. The reader caneasily supply the proof using the triangle inequality of the Euclidean norm.It does not take much imagination to extend the deﬁnition of the gaugeintegral to matrix-valued functions, and inequality (3.5) applies in thissetting as well. One of the nicest features of the gauge integral is that one can perturban integrable function at a countable number of points without changingthe value of its integral. This property fails for the Riemann integral but isexhibited by the Lebesgue integral. To validate the property, it suﬃces toprove that a function that equals 0 except at a countable number of pointshas integral 0. Suppose f (x) is such a function with exceptional pointsx1 , x2 , . . . and corresponding exceptional values f 1 , f 2 , . . .. We now deﬁnea gauge δ(x) with value 1 on the nonexceptional points and values δ(xj ) = 2j+2 [ f j + 1]at the exceptional points. If π is a δ-ﬁne partition, then xj can serve asa tag for at most two intervals [si , si+1 ] of π and each such interval haslength less than 2δ(xj ). It follows that ∞ 2 1 S(f, π) ≤ 2 f (xj ) j+2 [ f ≤ = j 2 j + 1] j=1 2j band therefore that a f (x) dx = 0. In practice, the interval additivity rule c b c f (x) dx = f (x) dx + f (x) dx (3.6) a a bis obviously desirable. There are three separate issues in proving it. First,given the existence of the integral over [a, c], do the integrals over [a, b]and [b, c] exist? Second, if the integrals over [a, b] and [b, c] exist, doesthe integral over [a, c] exist? Third, if the integrals over [a, b] and [b, c]exist, are they additive? The ﬁrst question is best approached throughProposition 3.3.1. For > 0 there exists a gauge δ(x) such that S(f, π1 ) − S(f, π2 ) <
8. 8. 60 3. The Gauge Integralfor any two δ-ﬁne partitions π1 and π2 of [a, c]. Given δ(x), take any twoδ-ﬁne partitions γ1 and γ2 of [a, b] and a single δ-ﬁne partition ω of [b, c].The concatenated partitions γ1 ∪ ω and γ2 ∪ ω are δ-ﬁne throughout [a, c]and satisfy S(f, γ1 ) − S(f, γ2 ) = S(f, γ1 ∪ ω) − S(f, γ2 ∪ ω) < .According to the Cauchy criterion, the integral over [a, b] therefore exists.A similar argument implies that the integral over [b, c] also exists. Finally,the combination of these results shows that the integral exists over anyinterval [u, v] contained within [a, b]. For the converse, choose gauges δ1 (x) on [a, b] and δ2 (x) on [b, c] so that b c S(f, γ) − f (x) dx < , S(f, ω) − f (x) dx < a bfor any δ1 -ﬁne partition γ of [a, b] and any δ2 -ﬁne partition ω of [b, c]. Theconcatenated partition π = γ ∪ ω satisﬁes b c S(f, π) − f (x) dx − f (x) dx a b b c ≤ S(f, γ) − f (x) + S(f, ω) − f (x) dx (3.7) a b < 2because the Riemann sums satisfy S(f, π) = S(f, γ)+S(f, ω). This suggestsdeﬁning a gauge δ(x) equal to δ1 (x) on [a, b] and equal to δ2 (x) on [b, c].The problem with this tactic is that some partitions of [a, c] do not splitat b. However, we can ensure a split by redeﬁning δ(x) by ˜ min{δ1 (b), δ2 (b)} x=b δ(x) = min{δ(x), 1 |x − b|} 2 x = b.This forces b to be the tag of its assigned interval, and we can if neededsplit this interval at b and retain b as tag of both subintervals. With δ(x)amended in this fashion, any δ-ﬁne partition π can be viewed as a con-catenated partition γ ∪ ω splitting at b. As such π obeys inequality (3.7).This argument simultaneously proves that the integral over [a, c] exists andsatisﬁes the additivity property (3.6) If the function f (x) is vector-valued with n components, then the in-tegrability of f (x) should imply the integrability of each its componentsfi (x). Furthermore, we should be able to write ⎛ b ⎞ b f (x) dx a 1 ⎜ . ⎟ f (x) dx = ⎝ . . ⎠. a b a fn (x) dx
9. 9. 3.3 Deﬁnition and Basic Properties of the Integral 61Conversely, if its components are integrable, then f (x) should be integrableas well. The inequalities n √ S(f, π) − I ≤ |S(fi , π) − Ii | ≤ n S(f, π) − I . i=1based on Example 2.5.6 and Problem 3 of Chap. 2 are instrumental inproving this logical equivalence. Given that we can integrate componentby component, for the remainder of this chapter we will deal exclusivelywith real-valued functions. We have not actually shown that any function is integrable. The mostobvious possibility is a constant. Fortunately, it is trivial to demonstratethat b c dx = c(b − a). aStep functions are one rung up the hierarchy of functions. If n−1 f (x) = ci 1(si ,si+1 ] (x) i=0for a = s0 < s1 < · · · < sn = b, then our nascent theory allows us toevaluate b n−1 si+1 n−1 f (x) dx = ci dx = ci (si+1 − si ). a i=0 si i=0This fact and the next technical proposition turn out to be the key toshowing that continuous functions are integrable.Proposition 3.3.2 Let f (x) be a function with domain [a, b]. Suppose forevery > 0 there exist two integrable functions g(x) and h(x) satisfyingg(x) ≤ f (x) ≤ h(x) for all x and b b h(x) dx ≤ g(x) dx + . a aThen f (x) is integrable.Proof: For > 0, choose gauges δg (x) and δh (x) on [a, b] so that b b S(g, πg ) − g(x) dx < , S(h, πh ) − h(x) dx < a a
10. 10. 62 3. The Gauge Integralfor any δg -ﬁne partition πg and any δh -ﬁne partition πh . If π is a δ-ﬁnepartition for δ(x) = min{δg (x), δh (x)}, then the inequalities b g(x) dx − < S(g, π) a ≤ S(f, π) ≤ S(h, π) b < h(x) + a b ≤ g(x) dx + 2 atrap S(f, π) in an interval of length 3 . Because the Riemann sum S(f, γ)for any other δ-ﬁne partition γ is trapped in the same interval, the integralof f (x) exists by the Cauchy criterion.Proposition 3.3.3 Every continuous function f (x) on [a, b] is integrable.Proof: In view of the uniform continuity of f (x) on [a, b], for every > 0there exists a δ > 0 with |f (x) − f (y)| < when |x − y| < δ. For theconstant gauge δ(x) = δ and a corresponding δ-ﬁne partition π with meshpoints s0 , . . . , sn , let mi be the minimum and Mi be the maximum of f (x)on [si , si+1 ]. The step functions n n g(x) = mi 1(si ,si+1 ] (x), h(x) = Mi 1(si ,si+1 ] (x) i=1 i=1then satisfy g(x) ≤ f (x) ≤ h(x) except at the single point a. Furthermore, b b n h(x) dx − g(x) dx ≤ (si+1 − si ) a a i=1 = (b − a).Application of Proposition 3.3.2 now completes the proof.3.4 The Fundamental Theorem of CalculusThe fundamental theorem of calculus divides naturally into two parts. Forthe gauge integral, the ﬁrst and more diﬃcult part is easily proved byinvoking what is called the straddle inequality. Let f (x) be diﬀerentiableat the point t ∈ [a, b]. Then there exists δ(t) > 0 such that f (x) − f (t) − f (t) < x−t
11. 11. 3.4 The Fundamental Theorem of Calculus 63for all x ∈ [a, b] with |x − t| < δ(t). If u < t < v are two points straddling tand located in [a, b] ∩ (t − δ(t), t + δ(t)), then |f (v) − f (u) − f (t)(v − u)| ≤ |f (v) − f (t) − f (t)(v − t)| + |f (t) − f (u) − f (t)(t − u)| ≤ (v − t) + (t − u) (3.8) = (v − u).Inequality (3.8) also clearly holds when either u = t or v = t.Proposition 3.4.1 (Fundamental Theorem I) If f (x) is diﬀerentiablethroughout [a, b], then b f (x) dx = f (b) − f (a). aProof: Using the gauge δ(t) ﬁguring in the straddle inequality (3.8), selecta δ-ﬁne partition π with mesh points a = s0 < s1 < · · · < sn = b and tagsti ∈ [si , si+1 ]. Application of the inequality and telescoping yield n−1 |f (b) − f (a) − S(f , π)| = [f (si+1 ) − f (si ) − f (ti )(si+1 − si )] i=0 n−1 ≤ |f (si+1 ) − f (si ) − f (ti )(si+1 − si )| i=0 n−1 ≤ (si+1 − si ) i=0 = (b − a).This demonstrates that f (x) has integral f (b) − f (a). The ﬁrst half of the fundamental theorem remains valid for a continuousfunction f (x) that is diﬀerentiable except on a countable set N [250]. Sincechanging an integrand at a countable number of points does not alter itsintegral, it suﬃces to prove that b 0 t∈N f (b) − f (a) = g(t) dt, where g(t) = a f (t) t ∈ N.Suppose > 0 is given. For t ∈ N deﬁne the gauge value δ(t) to satisfythe straddle inequality. Enumerate the points tj of N, and deﬁne δ(tj ) > 0so that |f (tj ) − f (tj + s)| < 2−j−2 whenever |s| < δ(tj ). Now select aδ-ﬁne partition π with mesh points a = s0 < s1 < · · · < sn = b and tagsri ∈ [si , si+1 ]. Break the sum n−1 f (b) − f (a) − S(g, π) = f (si+1 ) − f (si ) − g(ri )(si+1 − si ) i=0
12. 12. 64 3. The Gauge Integralinto two parts. Let S denote the sum of the terms with tags ri ∈ N, andlet S denote the sum of the terms with tags ri ∈ N . As noted earlier,|S | ≤ (b − a). Because a tag is attached to at most two subintervals, thesecond sum satisﬁes |S | ≤ |f (si+1 ) − f (si )| ri ∈N ≤ |f (si+1 ) − f (ri )| + |f (ri ) − f (si )| ri ∈N ∞ ≤ 2 22−j−2 = . j=1It follows that |S + S | ≤ (b − a + 1) and therefore that the stated integralexists and equals f (b) − f (a). In demonstrating the second half of the fundamental theorem, we willimplicitly use the standard convention c d f (x) dx = − f (x) dx d cfor c < d. This convention will also be in force in proving the substitutionformula.Proposition 3.4.2 (Fundamental Theorem II) If a function f (x) isintegrable on [a, b], then its indeﬁnite integral t F (t) = f (x) dx ahas derivative F (t) = f (t) at any point t where f (x) is continuous. Thederivative is taken as one sided if t = a or t = b.Proof: In deriving the interval additivity rule (3.6), we showed that theintegral F (t) exists. At a point t where f (x) is continuous, for any > 0there is a δ > 0 such that − < f (x) − f (t) < when |x − t| < δ andx ∈ [a, b]. Hence, the diﬀerence F (t + s) − F (t) 1 t+s − f (t) = [f (x) − f (t)] dx s s tis less than and greater than − for |s| < δ. In the limit as s tends to 0,we recover F (t) = f (t). The fundamental theorem of calculus has several important corollaries.These are covered in the next three propositions on the substitution rule,integration by parts, and ﬁnite Taylor expansions.
13. 13. 3.4 The Fundamental Theorem of Calculus 65Proposition 3.4.3 (Substitution Rule) Suppose f (x) is diﬀerentiableon [a, b], g(x) is diﬀerentiable on [c, d], and the image of [c, d] under g(x)is contained within [a, b]. Then g(d) d f (y) dy = f [g(x)]g (x) dx. g(c) cProof: Part I of the fundamental theorem and the chain rule identity {f [g(x)]} = f [g(x)]g (x)imply that both integrals have value f [g(d)] − f [g(c)].Proposition 3.4.4 (Integration by Parts) Suppose f (x) and g(x) arediﬀerentiable on [a, b]. Then f (x)g(x) is integrable on [a, b] if and only iff (x)g (x) is integrable on [a, b]. Furthermore, the two integrals are relatedby the identity b b f (x)g(x) dx + f (x)g (x) dx = f (b)g(b) − f (a)g(a), a aProof: The product rule for derivatives is [f (x)g(x)] = f (x)g(x) + f (x)g (x).If two of three members of this identity are integrable, then the third is aswell. Since part I of the fundamental theorem entails b [f (x)g(x)] dx = f (b)g(b) − f (a)g(a), athe proposition follows. The derivative of a function may itself be diﬀerentiable. Indeed, it makessense to speak of the kth-order derivative of a function f (x) if f (x) issuﬃciently smooth. Traditionally, the second-order derivative is denotedf (x) and an arbitrary kth-order derivative by f (k) (x). We can use theseextra derivatives to good eﬀect in approximating f (x) locally. The nextproposition makes this clear and oﬀers an explicit estimate of the error ina ﬁnite Taylor expansion of f (x).Proposition 3.4.5 (Taylor Expansion) Suppose f (x) has a derivativeof order k + 1 on an open interval around the point y. Then for all x in theinterval, we have k 1 (j) f (x) = f (y) + f (y)(x − y)j + Rk (x), (3.9) j=1 j!
14. 14. 66 3. The Gauge Integralwhere the remainder 1 (x − y)k+1 Rk (x) = f (k+1) [y + t(x − y)](1 − t)k dt. k! 0If |f (k+1) (z)| ≤ b for all z between x and y, then b|x − y|k+1 |Rk (x)| ≤ . (3.10) (k + 1)!Proof: When k = 0, the Taylor expansion (3.9) reads 1 f (x) = f (y) + (x − y) f [y + t(x − y)]dt 0and follows from the fundamental theorem of calculus and the chain rule.Induction and the integration-by-parts formula 1 f (k) [y + t(x − y)](1 − t)k−1 dt 0 1 1 = − f (k) [y + t(x − y)](1 − t)k k 0 x − y 1 (k+1) + f [y + t(x − y)](1 − t)k dt k 0 1 1 (k) x−y = f (y) + f (k+1) [y + t(x − y)](1 − t)k dt k k 0now validate the general expansion (3.9). The error estimate follows directlyfrom the bound |f (k+1) (z)| ≤ b and the integral 1 1 (1 − t)k dt = . 0 k+13.5 More Advanced Topics in IntegrationWithin the conﬁnes of a single chapter, it is impossible to develop rigorouslyall of the properties of the gauge integral. In this section we will discussbrieﬂy four topics: (a) integrals over unbounded intervals, (b) improperintegrals and Hake’s theorem, (c) the interchange of limits and integrals,and (d) multidimensional integrals and Fubini’s theorem. Deﬁning the integral of a function over an unbounded interval requiresseveral minor adjustments. First, the real line is extended to include thepoints ±∞. Second, a gauge function δ(x) is now viewed as mapping xto an open interval containing x. The associated interval may be inﬁnite;indeed, it must be inﬁnite if x equals ±∞. In a δ-ﬁne partition π, the
15. 15. 3.5 More Advanced Topics in Integration 67interval Ij containing the tag xj is contained in δ(xj ). The length of aninﬁnite interval Ij is deﬁned to be 0 in an approximating Riemann sumS(f, π) to avoid inﬁnite contributions to the sum. Likewise, the integrandf (x) is assigned the value 0 at x = ±∞. This extended deﬁnition carries with it all the properties we expect. Itsmost remarkable consequence is that it obliterates the distinction betweenproper and improper integrals. Hake’s theorem provides the link. If weallow a and b to be inﬁnite as well as ﬁnite, then Hake’s theorem says afunction f (x) is integrable over (a, b) if and only if either of the two limits b c lim f (x) dx or lim f (x) dx c→a c c→b a bexists. If either limit exists, then a f (x) dx equals that limit. For instance,the integral ∞ c c 1 1 1 dx = lim dx = lim − = 1 1 x2 c→∞ 1 x2 c→∞ x 1exists and has the indicated limit by this reasoning. ∞Example 3.5.1 Existence of 0 sinc(x) dxConsider the integral of sinc(x) = sin(x)/x over the interval (0, ∞). Becausesinc(x) is continuous throughout [0, 1] with limit 1 as x approaches 0, theintegral over [0, 1] is deﬁned. Hake’s theorem and integration by parts showthat the integral ∞ c sin x sin x dx = lim dx 1 x c→∞ 1 x c cos x c cos x = lim − − dx c→∞ x 1 1 x2 ∞ cos x = cos 1 − dx 1 x2exists provided the integral of x−2 cos x exists over (1, ∞). We will demon-strate this fact in a moment. If we accept it, then it is clear that the integralof sinc(x) over (0, ∞) exists as well. As we shall ﬁnd in Example 3.5.4, thisintegral equals π/2. In contrast to these positive results, sinc(x) is notabsolutely integrable over (0, ∞). Finally, we note in passing that the sub-stitution rule gives ∞ ∞ ∞ sin cx sin y −1 sin y π dx = c dy = dy = . 0 x 0 c−1 y 0 y 2for any c > 0.
16. 16. 68 3. The Gauge Integral We now ask under what circumstances the formula b b lim fn (x) dx = lim fn (x) dx (3.11) n→∞ a a n→∞is valid. The two relevant theorems permitting the interchange of limitsand integrals are the monotone convergence theorem and the dominatedconvergence theorem. In the monotone convergence theorem, we are givenan increasing sequence fn (x) of integrable functions that converge to aﬁnite limit for each x. Formula (3.11) is true in this setting provided b sup fn (x) dx < ∞. n aIn the dominated convergence theorem, we assume the sequence fn (x) istrapped between two integrable functions g(x) and h(x) in the sense that g(x) ≤ fn (x) ≤ h(x)for all n and x. If limn→∞ fn (x) exists in this setting, then the inter-change (3.11) is allowed. The choices fn (x) = 1[1,n] (x)x−2 cos x , g(x) = −x−2 , h(x) = x−2in the dominated convergence theorem validate the existence of ∞ n x−2 cos x dx = lim x−2 cos x dx. 1 n→∞ 1We now consider two more substantive applications of the monotone anddominated convergence theorems.Example 3.5.2 Johann Bernoulli’s IntegralAs example of delicate maneuvers in integration, consider the integral 1 1 1 dx = e−x ln x dx 0 xx 0 1 ∞ (−x ln x)n = dx 0 n=0 n! ∞ 1 1 = (−x ln x)n dx . n=0 n! 0The reader will notice the application of the monotone convergence theoremin passing from the second to the third line above. Further progress can bemade by applying the integration by parts result 1 1 n lnn−1 x xm lnn x dx = − xm+1 dx 0 m+1 0 x 1 n = − xm lnn−1 x dx m+1 0
17. 17. 3.5 More Advanced Topics in Integration 69recursively to evaluate 1 1 n! n! (−x ln x)n dx = xn dx = . 0 (n + 1)n 0 (n + 1)n+1The pleasant surprise 1 ∞ 1 1 dx = 0 xx n=0 (n + 1)n+1emerges.Example 3.5.3 Competing Deﬁnitions of the Gamma FunctionThe dominated convergence theorem allows us to derive Gauss’s represen-tation n!nz Γ(z) = lim n→∞ z(z + 1) · · · (z + n)of the gamma function from Euler’s representation ∞ Γ(z) = xz−1 e−x dx . 0As students of statistics are apt to know from their exposure to the betadistribution, repeated integration by parts and the fundamental theoremof calculus show that 1 n! xz−1 (1 − x)n dx = . 0 z(z + 1) · · · (z + n)The substitution rule yields 1 n n y nz xz−1 (1 − x)n dx = y z−1 1 − dy . 0 0 nThus, it suﬃces to prove that ∞ n n y xz−1 e−x dx = lim y z−1 1 − dy . 0 n→∞ 0 nGiven the limit y n lim 1− = e−y , n→∞ nwe need an integrable function h(y) that dominates the nonnegative se-quence y n fn (y) = 1[0,n] (y)y z−1 1 − n
18. 18. 70 3. The Gauge Integralfrom above in order to apply the dominated convergence theorem. In lightof the inequality y n 1− ≤ e−y , nthe function h(y) = y z−1 e−y will serve. Finally, the gauge integral extends to multiple dimensions, where a ver-sion of Fubini’s theorem holds for evaluating multidimensional integralsvia iterated integrals [278]. Consider a function f (x, y) deﬁned over theCartesian product H × K of two multidimensional intervals H and K. Theintervals in question can be bounded or unbounded. If f (x, y) is integrableover H × K, then Fubini’s theorem asserts that the integrals H f (x, y) dxand K f (x, y) dy exist and can be integrated over the remaining variableto give the full integral. In symbols, f (x, y) dx dy = f (x, y) dx dy = f (x, y) dy dx . H×K K J J KConversely, if either iterated integral exists, one would like to conclude thatthe full integral exists as well. This is true whenever f (x, y) is nonnega-tive. Unfortunately, it is false in general, and two additional hypothesesintroduced by Tonelli are needed to rescue the situation. One hypothesisis that f (x, y) is measurable. Measurability is a technical condition thatholds except for very pathological functions. The other hypothesis is that|f (x, y)| ≤ g(x, y) for some nonnegative function g(x, y) for which theiterated integral exists. This domination condition is shared with the dom-inated convergence theorem and forces f (x, y) to be absolutely integrable. ∞Example 3.5.4 Evaluation of 0 sinc(x) dxAccording to Fubini’s theorem n nπ nπ n e−xy sin x dx dy = e−xy sin x dy dx . (3.12) 0 0 0 0The second of these iterated integrals nπ n nπ nπ sin x sin x e−xy sin x dy dx = dx − e−nx dx 0 0 0 x 0 x ∞tends to 0 as n tends to ∞ by a combination of Hake’s theorem sinc(x) dxand the dominated convergence theorem. The inner integral of the leftiterated integral in (3.12) equals nπ nπ nπ e−xy sin x dx = −e−xy cos x − ye−xy sin x 0 0 0 nπ − y2 e−xy sin x dx 0 nπ −nπy = 1−e cos nπ − y 2 e−xy sin x dx 0
19. 19. 3.6 Problems 71after two integrations by parts. It follows that nπ 1 − e−nπy cos nπ e−xy sin x dx = . 0 1 + y2Finally, application of the dominated convergence theorem gives ∞ n 1 − e−nπy cos nπ 1 lim dy = dy n→∞ 0 1 + y2 0 1 + y2 π = . 2Equating the limits of the right and left hand sides of the identity (3.12) ∞therefore [278] yields the value of π/2 for 0 sinc(x) dx.3.6 Problems 1. Give an alternative proof of Cousin’s lemma by letting y be the supre- mum of the set of x ∈ [a, b] such that [a, x] possesses a δ-ﬁne partition. 2. Use Cousin’s lemma to prove that a continuous function f (x) deﬁned on an interval [a, b] is uniformly continuous there [108]. (Hint: Given > 0 deﬁne a gauge δ(x) by the requirement that |f (y) − f (x)| < 1 2 for all y ∈ [a, b] with |y − x| < 2δ(x).) 3. A possibly discontinuous function f (x) has one-sided limits at each point x ∈ [a, b]. Show by Cousin’s lemma that f (x) is bounded on [a, b]. 4. Suppose f (x) has a nonnegative derivative f (x) throughout [a, b]. Prove that f (x) is nondecreasing on [a, b]. Also prove that f (x) is constant on [a, b] if and only if f (x) = 0 for all x. (Hint: These yield easily to the fundamental theorem of calculus. Alternatively for the ﬁrst assertion, consider the function f (x) = f (x) + x for > 0.) 5. Using only the deﬁnition of the gauge integral, demonstrate that b −a f (t) dt = f (−t) dt a −b when either integral exists.
20. 20. 72 3. The Gauge Integral 6. Based on the standard deﬁnition of the natural logarithm y 1 ln y = dx, 1 x prove that ln yz = ln y + ln z for all positive arguments y and z. Use this property to verify that ln y −1 = − ln y and that ln y r = r ln y for every rational number r. 7. Apply Proposition 3.3.2 and demonstrate that every monotonic func- tion deﬁned on an interval [a, b] is integrable on that interval. 8. Let f (x) be a continuous real-valued function on [a, b]. Show that there exists c ∈ [a, b] with b f (x) dx = f (c)(b − a). a 9. In the Taylor expansion of Proposition 3.4.5, suppose f (k+1) (x) is continuous. Show that we can replace the remainder by (x − y)k+1 (k+1) Rk (x) = f (z) (k + 1)! for some z between x and y. 10. Suppose that f (x) is inﬁnitely diﬀerentiable and that c and r are pos- itive numbers. If |f (k) (x)| ≤ ck!rk for all x near y and all nonnegative integers k, then use Proposition 3.4.5 to show that ∞ f (k) (y) f (x) = (x − y)k k! k=0 near y. Explicitly determine the inﬁnite Taylor series expansion of the function f (x) = (1 + x)−1 around x = 0 and justify its convergence. 11. Suppose the nonnegative continuous function f (x) satisﬁes b f (x) dx = 0. a Prove that f (x) is identically 0 on [a, b]. 12. Consider the function f (x) = x2 sin (x−2 ) x = 0 0 x = 0. 1 1 Show that 0 f (x) dx = sin (1) and limt↓0 t |f (x)| dx = ∞. Hence, f (x) is integrable but not absolutely integrable on [0,1].
21. 21. 3.6 Problems 7313. Prove that ∞ β 1 α+1 xα e−x dx = Γ 0 β β for α and β positive [82].14. Justify the formula 1 ∞ ln(1 − x) 1 = − . 0 x n=1 n215. Show that ∞ xz−1 dx = ζ(z)Γ(z), 0 ex − 1 ∞ where ζ(z) = n=1 n−z .16. Prove that the functions ∞ sin t f (x) = dt 1 x2 + t2 ∞ g(x) = e−xt cos t dt, x > 0, 0 are continuous.17. Let fn (x) be a sequence of integrable functions on [a, b] that converges uniformly to f (x). Demonstrate that f (x) is integrable and satisﬁes b b lim fn (x) dx = f (x) dx . n→∞ a a (Hints: For > 0 small take n large enough so that fn (x) − ≤ f (x) ≤ fn (x) + 2(b − a) 2(b − a) for all x.)18. Let p and q be positive integers. Justify the series expansion 1 ∞ xp−1 (−1)n dx = 0 1 + xq n=0 p + nq by the monotone convergence theorem. Be careful since the series does not converge absolutely [278].
22. 22. 74 3. The Gauge Integral 19. Suppose f (x) is a continuous function on R. Demonstrate that the sequence n−1 1 k fn (x) = f x+ n n k=0 converges uniformly to a continuous function on every ﬁnite interval [a, b] [69]. 20. Prove that 1 xb − xa b+1 dx = ln 0 ln x a+1 for 0 < a < b [278] by showing that both sides equal the double integral xy dx dy . [0,1]×[a,b] 21. Integrate the function y 2 − x2 f (x, y) = (x2 + y 2 )2 over the unit square [0, 1]× [0, 1]. Show that the two iterated integrals disagree, and explain why Fubini’s theorem fails. 2 2 22. Suppose the two partial derivatives ∂x∂∂x2 f (x) and ∂x∂∂x1 f (x) exist 1 2 and are continuous in a neighborhood of a point y ∈ R2 . Show that they are equal at the point. (Hints: If they are not equal, take a small box around the point where their diﬀerence has constant sign. Now apply Fubini’s theorem.) 23. Demonstrate that ∞ √ 2 π e−x dx = 0 2 2 2 by evaluating the integral of f (y) = y2 e−(1+y1 )y2 over the rectangle (0, ∞) × (0, ∞).