Notes on Luenberger's Vector Space Optimization

Convexity and Optimization WITH

A PPLICATIONS

Paul G. Bamberg Harvard University

Convexity and Optimization WITH

A PPLICATIONS

Paul G. Bamberg

c Copyright 2008 Paul G. Bamberg Harvard University Cambridge, MA 02138

This text is based on lecture notes by Paul G. Bamberg written for M ATH 116: Convexity and Optimization with Applications, a course offered at Harvard University in Fall 2008. The notes were meant to complement Optimization by Vector Space Methods by David Luenberger (Wiley Interscience, 1969 [1997]).

Front cover: The image was generated by the following M ATHEMATICA code: GraphicsGrid[ Table[ReliefPlot[ Table[Evaluate[ Sum[RiemannSiegelZ[RandomReal[3, 2].{x, y}], {3}]], {x, 0, 10, .2}, {y, 0, 10, .2}], ColorFunction -> ColorData["BlueGreenYellow"], Frame -> False], {3}, {3}]]

C ONTENTS

1

Generalizing from Two Dimensions 1.1 Introduction . . . . . . . . . . . . . . . . . . . 1.2 Existence of Optimal Solutions . . . . . . . . 1.3 Linear Programming . . . . . . . . . . . . . . 1.4 Finite- vs. Infinite-Dimensional Vector Spaces 1.5 Minimum Norm Problems . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

5 5 5 7 9 14

Preliminaries in Algebra, Topology, and Analysis 2.1 Vector Spaces . . . . . . . . . . . . . . . . . . 2.2 Convex Sets . . . . . . . . . . . . . . . . . . . 2.3 Linear Independence and Dimension . . . . . 2.4 Normed Vector Spaces . . . . . . . . . . . . . 2.5 Open and Closed Sets . . . . . . . . . . . . . 2.6 Convergence, Limits, and Continuity . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

21 21 25 28 30 34 37

. . . . . . .

41 41 44 47 48 53 55 56

. . . . .

59 59 61 63 64 67

5

Dual Spaces and the Hahn-Banach Theorem 5.1 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Common Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75 75 77

6

Applications of the Hahn-Banach Theorem 6.1 The Dual of C ra, bs . . . . . . . . . . . . . 6.2 The Second Dual Space . . . . . . . . . . . 6.3 Alignment and Orthogonal Components 6.4 Minimum Norm Problems . . . . . . . . . 6.5 Applications . . . . . . . . . . . . . . . . .

87 87 90 91 92 95

2

3

4

Banach Spaces 3.1 lp Space . . . . . . . . . . . . 3.2 Lebesgue Integration . . . . 3.3 Lp Space . . . . . . . . . . . 3.4 Cauchy Sequences . . . . . 3.5 Compactness and Extrema . 3.6 Quotient Spaces . . . . . . . 3.7 Denseness and Separability

. . . . . . .

Hilbert Space 4.1 Inner Products . . . . . . . . . 4.2 The Projection Theorem . . . 4.3 Orthogonal Complements . . 4.4 The Gram-Schmidt Procedure 4.5 Fourier Series . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . 3

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

. . . . .

4 U 6.6 7

8

CONTENTS Hyperplanes and Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Calculus of Variations 7.1 Review . . . . . . . . . . . . . . . 7.2 Gateaux and Fréchet Differentials 7.3 Euler-Lagrange Equations . . . . 7.4 Problems with Constraints . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

103 103 104 107 110

Convex Functionals 8.1 Local to Global . . . . . . . . . 8.2 Conjugate Convex Functionals 8.3 Conjugate Concave Functionals 8.4 Fenchel Duality . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

115 115 117 121 123

Bibliography

. . . .

130

C HAPTER 1 GENERALIZING FROM TWO DIMENSIONS

Reading: [1, Chapter 1]

1.1

Introduction

The general approach in [1], which has made the book a classic, is this: • Identify techniques from algebra, elementary single-variable calculus, or elementary multivariable calculus that can be used to solve optimization problems. • Reformulate the solution geometrically. • Using geometry for inspiration, generalize the solution, typically to infinite-dimensional vector spaces and non-Euclidean norms, and prove (algebraically) that it is still valid. All the finite-dimensional problems in this chapter should be familiar, though they may be valuable review for some students. The infinite-dimensional problems are just stated, not solved, and we will take quite a while to get to them. In this chapter, we will hold off on defining some important concepts, which for now are just in SMALL CAPS . These concepts will appear in Chapters 2, 3, and 5 of [1], generally in a context where there is no mention of optimization, just some challenging mathematics. You will need to learn them before you tackle optimization. In the process, you will acquire a good background in real analysis and in the branch of mathematics called “functional analysis”, the theory of normed infinite-dimensional vector spaces. My hope for these introductory notes is to convince you that optimization problems are fun and relevant, that some of the best ones can only be formulated in infinite-dimensional vector spaces, and that it is worth your while to learn quite a few new definitions and theorems in order to be able to solve them.

1.2

Existence of Optimal Solutions

We begin by remembering an EXISTENCE THEOREM from real analysis: T HEOREM 1.2.1 (extreme value theorem). If f is a continuous real function on a compact metric space X, M suppPX f ppq, and m inf pPX f ppq, then there exist points q, r P X such that f pqq M and f prq m. The following examples illustrate applications of this theorem.

5

6 U

1. G ENERALIZING FROM T WO D IMENSIONS

Example 1.2.2. You are entering a student competition to draw up a business plan for a company with m scientists and n other employees. Entries with m2 ¡ 2n2 get rejected. You want to have the highest possible ratio of scientists to other employees. Does this optimization problem have a solution? Solution (based on [2, Example 1.1, p. 2]): We want to solve max m{n such that m2 ¤ 2n2 for m, n P N. Equivalently, we want to find the largest rational number p m{n such that p2 2. We cannot apply Theorem 1.2.1 because the function we are optimizing (our OBJECTIVE FUNCTION) is rational-, not realvalued. This alone is not enough to rule out the existence of an optimal solution, but we can show that, in fact, an optimal solution does not exist. Assume that p is this largest rational number such that p2 2. Define

p pp 22 . 2

q Then

q2 2

(1.2.1)

2pp2 2q pp 2 q2 .

(1.2.2)

Since p2 2, (1.2.2) shows that q 2 2. However, (1.2.1) shows that q ¡ p, which contradicts our initial assumption that p is the largest rational number such that p2 2. Therefore this optimization problem has no solution. q Of course, note that as p2 approaches 2, m and n would become larger and larger. Since the number of people is finite, we should impose constraints m ¤ mmax and n ¤ nmax , in which case there would be a solution. Example 1.2.3. As director of the state lottery, you are designing scratch tickets by assigning probabilities to the possible payoffs from 0 through $4. Since a ticket sells for $5, you want to be as generous as possible. Does this optimization problem in R5 have a solution? Change the problem so that any nonnegative integer payoff is allowed. Does this optimization problem in an infinite-dimensional vector space have a solution? Solution: We define “as generous as possible” as “maximizing the expected value of the ticket”. There° fore, in the first problem, we want to solve max 4k0 kpk where pk is the probability of receiving a payoff ° of $k, such that 0 ¤ pk ¤ 1 for all k 0, . . . , 4 and 4k0 pk 1. Our objective function is continuous and real-valued, and the constraints on tpk u4k0 define a compact set, so Theorem 1.2.1 guarantees the existence of a solution. We simply set pk 1 for k 4 and pk 0 for k 4. ° For the second problem, we want to solve max kPZ kpk , such that 0 ¤ pk ¤ 1 for all k P Z and ° kPZ pk 1. The notion of compactness is tricky in infinite-dimensional vector spaces so we cannot apply Theorem 1.2.1 to this problem, but we can show that, in fact, an optimal solution does not exist. Assume that we have some solution pk πk for all k P Z . We can always increase the expected value of the ticket and still satisfy our constraints by setting p0 0 and pk πk1 for k P N since

8 ¸

k 1

kπk1

8 ¸

k 0

pk

1qπk

8 ¸

¥

kπk .

k 0

Because the expected value is unbounded, there is no solution.

q

1.3. Linear Programming U

7

We will return to the concepts of compactness and METRIC SPACES later.

1.3

Linear Programming

Example 1.3.1. Your small bakery can produce only two products: frosted cookies and cakes. A batch of frosted cookies uses up 1 pound of flour and 3 pounds of sugar. A batch of cake uses up 2 pounds of flour and 1 pound of sugar. Each day your suppliers bring you 14 pounds of flour and 17 pounds of sugar. Your optimization problem is to look at the market price of cookies and cakes and decide what to produce.

Figure 1.3.1: possible production schemes in Example 1.3.1 Solution: Let x be the number of batches of cookies, y be the number of batches of cake, p1 be the market price of a batch of cookies, and p2 be the market price of a batch of cake. Then we want to solve: max x,y

such that

p1 x

p2 y

¤ 14 3x y ¤ 17 x, y ¥ 0 x

2y

(revenueq (flour constraint) (sugar constraint)

Figure 1.3.1 shows the possible production schemes. Observe that if v and w represent possible production schemes, then αv p1 αqw with 0 ¤ α ¤ 1, is also possible. This is the definition of a CONVEX SET. To show this, let 1 2 x 14 A ,x ,b 3 1 y 17 so that we can write the constraints as Ax ¤ b. Then if Av ¤ b and Aw ¤ b, αAv ¤ αb and p1 αqAw ¤

8 U


p1 αqb; adding these inequalities gives the desired result. Now we can consider what the optimal solutions are for various values of p1 and p2 : • p1 5, p2 5. We see from Figure 1.3.1 that the optimal solution is x $5 4 $5 5 $45.

4, y

• p1 1, p2 7. Again, we see from Figure 1.3.1 that the optimal solution is x 0, y $1 0 $7 7 $47.

5. Revenue is

7. Revenue is

• p1 6, p2 2. Notice in Figure 1.3.1 that the revenue function overlaps with the sugar constraint. Therefore, we maximize revenue by choosing any point along this constraint. Revenue is $6 17 3 $2 0 $34. q The revenue function is an example of a ( LINEAR ) FUNCTIONAL on R2 that we are trying to optimize. It is an element of the DUAL SPACE. The straight line that we “slide” to solve the problem is an example of a HYPERPLANE. The fact that this approach works for any convex set is a simple consequence of the H AHN -B ANACH T HEOREM. What we have done is to calculate, for any functional in the dual space, the largest value that this function can achieve subject to the constraint imposed by our budget. This functional on the dual space is called the SUPPORT FUNCIONAL. ¯ 1 ? Can you reconstruct the unit Example 1.3.2. What is the support functional for the (closed) unit disk D disk (or any other convex set) from its support functional?

Figure 1.3.2: As shown in Example 1.3.2, on the left we see that the envelope of all functionals for which the support functional returns a constant value (i. e. , 1) is the boundary of our convex set, ? the closed unit disk. On the right we see that the contour plots of the support functional cpm, nq m2 n2 are concentric circles centered at the origin; the contour cpm, nq 1 is the boundary of our convex set. Solution: Let the functionals be given by mx

ny. The support functional is then

cpm, nq inf c : mx

ny

¤ c for all px, yq P D¯ 2

(

1.4. Finite- vs. Infinite-Dimensional Vector Spaces U

9

¯ 2 is the closed unit disk). We can see from the left side of Figure 1.3.2 that cpm, nq c0 , where mx0 (D ny0 c0 is the tangent to the unit disk at the point px0 , y0 q. The slope of the tangent is m{n, so the slope of the normal is n{m? and the angle between the normal and the positive x-axis is θ? tan1 pn{mq. ? Therefore, x0 cos θ m{ m2 n2 and y0 sin θ n{ m2 n2 . Then cpm, nq c0 m2 n2 . Observe that the value of the support functional evaluated for a given functional was just the given functional evaluated at some point on the boundary of the convex set. Therefore, intuitively, it seems that we can reconstruct the convex set from the support functional. We can see this in two ways. First, the boundary of the convex set can be given by a contour of the support functional, as shown in Figure 1.3.2. Equivalently, the boundary of the convex set is the envelope of all functionals for which the support functional returns some constant value, as shown in Figure 1.3.2. q

1.4

Finite- vs. Infinite-Dimensional Vector Spaces

Example 1.2.3 illustrated some of the added complexities of moving from finite- to infinite-dimensional vector spaces. The following examples further explore the differences between solving optimization problems in finite- versus infinite-dimensional vector spaces. Example 1.4.1 (infinite-dimensional: equivalent to [1, Example 8.7.3, p. 231–234]). You are playing a realtime strategy computer game in which you have to build up a civilization from scratch. The first phase of the game, the “Age of Agriculture”, lasts from time t 0 to time t T . In this phase, you have farms that produce at a rate f ptq. Your farms disappear when you move to the next age, the “Age of Mining”. What makes this game interesting is that you can allocate production between reinvestment rptq and storage sptq. Your farm production increases at a rate proportional to your reinvestment: f9ptq krptq,

k

¡ 0, f9ptq df . dt

What system of allocation maximizes the amount of food available at time T ? Solution: Note that if we set sptq 0, then f ptq rptq and f9ptq krptq any case, we can solve (1.4.1) with the boundary condition f p0q f0 : f pt q

»t 0

krpτ q dτ

kf ptq, so f ptq f0 exppktq. In

f0 .

Then the total amount of food stored by time T is »T 0

sptq dt

»T 0

pf ptq rptqq dt

» T » t 0

0

krpτ q dτ

rptq

dt

f0 T.

Reinvestment and storage must be nonnegative, so we have the constraints 0 ¤ rptq ¤ f ptq ùñ 0 ¤ rptq ¤

»t 0

krpτ q dτ

f0 .

If functions r1 ptq and r2 ptq satisfy these inequalities, so does αr1 ptq p1 αqr2 ptq, where 0 ¤ α ¤ 1. The set of acceptable solutions to the problem is a CONVEX SET in an INFINITE DIMENSIONAL vector space and

10 U


the quantity to be maximized is a LINEAR FUNCTIONAL on this space. We guess that the optimal strategy is to reinvest everything during the interval p0, ts and then store everything during the interval pt, T s:

P p0, ts: f pτ q rpτ q, so f9pτ q krpτ q kf pτ q, f pτ q f0 exppkτ q, and spτ q 0. τ P pt, T s: f pτ q spτ q, so f9pτ q krpτ q 0, f pτ q f0 exppktq, and spτ q f0 exppktq.

• τ •

Then we want to maximize

»T 0

spτ q dτ

pT tqf0 exppktq.

Differentiating with respect to t gives

pT tqkf0 exppktq f0 exppktq 0 ùñ t T k1 . This solution holds if T 1{k ¥ 0 (if k ¥ 1{T ); if k 1{T , then the derivative of the objective function with respect to t is always negative and we are forced into a corner solution in which we just set t 0. q Example 1.4.2 (finite-dimensional). You are growing a new genetically engineered crop as part of a 2-year biofuels experiment. Your contract requires you to deliver 5 tons at the end of each year. The government pays all costs except the cost of fencing your plot of land, so the cost of producing x tons can be modeled ? as cpxq 6 x. The inventory cost of storing your excess crop is hx.

Figure 1.4.1: possible production schemes for Example 1.4.2 Solution: Let yi be the number of tons grown in year i for i 1, 2. We have the constraints y1 ¥ 5 and y1 y2 ¥ 10 since we must grow at least 5 tons in order to be able to deliver the required amount by the end of year 1, and we must grow at least 10 tons in order to be ale to deliver the required amount by the end of year 2. The region corresponding to possible production schemes is shown in Figure 1.4.1. Clearly,


?

11

?

our cost function 6 y1 6 y2 hpy1 5q is strictly increasing in both y1 and y2 within the feasible region, so the minimum cost will be found somewhere along the line y1 y2 10 between y1 5 and y1 10. We substitute y2 10 y1 into the cost function: now we want to minimize

?

a

6 10 y1

6 y1

hpy 1 5 q

with respect to y1 . The derivative with respect to y1 is

?10 y ?y 1 10 y h 3 a 1 h, y1 p10 y1 q 1 which is strictly positive for y1 P p5, 10s and zero for y1 5. Therefore, we reach a corner solution by ? setting py1 , y2 q p5, 5q, at which the minimum cost is 12 5 26.83. q ?3y 1

3

Example 1.4.3 (infinite-dimensional: equivalent to [1, Example 1.2.2, p. 3]). Your new contract requires to deliver at a known rate dptq during the interval t P p0, T s. You produce at a rate rptq. Your rate of a production cost is cprptqq (if all you have to pay for is maintaining the fence, this might equal rptq. If you have xptq tons on hand, you pay inventory costs at a rate hxptq. You start with an inventory xp0q. Solution: First, note that x9 ptq have the constraints

rptq dptq. Both inventory and production must be nonnegative, so we xptq xp0q

»t 0

prpτ q dpτ qq dτ ¥ 0 rptq ¥ 0.

We are trying to minimize total costs: »T 0

pcprptqq

hxptqq dt.

The optimal solution r ptq lies in an infinite-dimensional vector space. Note that r ptq is not necessarily continuous: for example, we might want to produce some constant positive quantity r pτ q r for τ P p0, ts and then produce rpτ q 0 for τ P pt, T³s. rptq is not necessarily bounded either: if h is very small, T then the optimal solution will be to produce 0 dptq dt, the total amount required over the interval p0, T s, as rapidly as possible. q See [1, Example 8.7.4, p. 234] for a related problem.

Example 1.4.4 (finite-dimensional). You are operating a simple frictionless rocket-propelled car. You expend fuel instantaneously to increase your speed to v miles/second, coast 1 mile, and use your brakes to stop instantaneously. You then expend fuel to increase your speed to w miles/second, coast 4 miles, and stop instantaneously. The challenge is to minimize the travel time. The constraint imposed by your fuel tank is that v w 3.

12 U


Solution: The total time spent is 1{v

4{w. We use a Lagrange multiplier to solve this problem:

Lpv, w, λq

1 v

4 w

λ pv

BL 1 λ 0 Bv v2 BL 4 B w w2 λ 0

w 3q

Combining the two partial derivatives gives 4v 2 w2 . Combining this with the constraint gives 4v 2 p3 vq2, so v 1 and w 2 (we discard the solution v 3 since v, w ¥ 0 as we are discussing speed, not velocity). The travel time is 1.5 seconds. q If this were a physics course, we might make this problem more realistic by assuming some nonzero coefficient of friction between the car and the surface on which it travels and some function that gives the rate of expenditure of fuel, among many other things.

Example 1.4.5 (infinite-dimensional: [1, Example 1.2.5, p. 4]). Now your rocket-propelled vehicle goes straight up and is subject to gravity. You expend fuel at a rate uptq. Your goal is for the vehicle to reach height h at time T while expending minimum fuel.

Solution (full solution in [1, Example 5.9.3, p. 125]): Assuming unit mass, massless fuel, and the absence of aerodynamic forces, the equation of the rocket is governed by the second-order differential equation : ptq uptq g. We cannot expend negative fuel and we want to reach height h at time T , so we have the h constraints

hpT q

» T » t 0

0

uptq ¥ 0

pupτ q gq dτ

dt ¥ h,

where we assume in the second constraint that hp0q h9 p0q 0; that is, we start at ground level with zero ³T velocity. We want to minimize the total fuel expended: 0 uptq dt. The optimal solution u ptq is not necessarily continuous or bounded: for example, it might (and actually does) consist of an impulse at time t 0. In that case, in order to work with a function that is at ³t least bounded, it might be a better idea to work with v ptq 0 upτ q dτ , the total amount of fuel expended during the interval p0, ts. This insight is closely related to the R IESZ REPRESENTATION THEOREM. q

Example 1.4.6 (finite-dimensional). Your job is to invent the most exciting scratch ticket. If an outcome has probability p of occurring, the excitement that results when that outcome occurs is proportional to log p. For example, the excitement of rolling 6s on two dice log 361 2 log 61 is precisely twice the excitement of rolling 6 on one die. The ticket can pay $k for k 0, 1, 2. You must assign probabilities pk for k 0, 1, 2 to to these three outcomes. So that the state can make a nice profit, the expected payoff must be 4{7 of a dollar.


13

Solution: Our two constraints are 2 ¸

pk

2 ¸

1

k 0

74 .

kpk

k 0

We want to maximize the quantity H pX q

2 ¸

(1.4.1)

pk log pk .

k 0

(H pX q, the “excitement function”, is the information entropy or Shannon entropy of the random variable X, the payoff of the lottery ticket.) We can substitute the constraints into (1.4.1) to turn this into a singlevariable problem, where we maximize

3 7

3 7

p2 log

p2

4 7

2p2

log

2p2

4 7

p2 log p2

over p2 . However, this would be messy, so we just use Lagrange multipliers: Lpp1 , p2 , p3 , λ1 , λ2 q

2 ¸

pk log pk λ1

k 0

BL 1 B pk Therefore, pk

e1

λ1

eλ2

k

2 ¸

pk 1

λ2

k 0

log pk λ1 kλ2

. Some algebra tells us λ1

2 ¸

k pk

k 0

0,

k

4 7

0, 1, 2

log 7 1 and λ2 log 2, so pk 2k {7.

q

Example 1.4.7 (infinite-dimensional). Your job is to again to invent the most exciting scratch ticket. The ticket can have any nonnegative integer payoff. You need to assign probabilities pk for k P Z to each of these possible outcomes. Make the expected payoff be $1, since the tickets will sell for $1.50. Solution: Now we are trying to maximize over an infinite number of variables. Obviously we cannot use substitution as we could in the previous example since we still have an infinite number of variables after taking into account the two constraints

8 ¸

pk

8 ¸

1

k 0

kpk

1.

k 0

There are two LINEAR FUNCTIONALS in the dual space acting as constraints, so the CODIMENSION of this problem is 2. We can still solve the problem using Lagrange multipliers: Lpp0 , p1 , . . . , λ1 , λ2 q

8 ¸

k 0

BL 1 B pk Therefore, pk

e1

λ1

e λ2

k

pk log pk λ1

8 ¸

pk 1

λ2

k 0

log pk λ1 kλ2

0,

2 ¸

k pk 1

k 0

k

¥0

. We recognize this as in the form of a geometric distribution pk

pp1 pqk

14 U


(note this is a geometric distribution with support r0, 8q, not r1, 8q). Since the mean of a geometric distribution is p1 pq{p, we have p 1{2, so λ1 log 2 1, λ2 log 2, and pk 1{2k 1 . q

1.5

Minimum Norm Problems

Minimum norm problems arise in several instances, as shown by the following examples. Example 1.5.1. You are visiting a friend who lives at the point px, y q along the main highway, a subspace whose equation is x 2y 0.

p2, 1q. A bus drives through town

• Where do you get off the bus in order to minimize your walking distance to the friend’s house? • Suppose you take a taxicab, which can travel only north-south or east-west. The cab driver charges for the driving distance. Where do you get out of the cab in order to minimize your cost? • What if the driver only charges for the larger dimension?

Figure 1.5.1: minimizing the distance between the highway and the house for the various norms in Example 1.5.1 Solution: In the first problem, we are minimizing the 2-NORM or E UCLIDEAN NORM. We can solve the problem geometrically by expanding a circle centered at the house until it is tangent to the road, as shown in Figure 1.5.1. The point of tangency is the solution to the system #

0 y 1 2 px 2 q

x

2y

highway

normal to the highway through the point p2, 1q a

,

or px, y q p6{5, 3{5q. The minimum cost (assuming unit cost) is then p4{5q2 p8{5q2 4{sqrt5. Note that we were implicitly invoking the PROJECTION THEOREM ([1], Theorem 2, p. 51) in knowing that we minimize the distance by finding the point on the highway through which the normal through p2, 1q passes.

1.5. Minimum Norm Problems U

15

In the second problem, we are minimizing the 1-norm or TAXICAB NORM. We can solve the problem geometrically by expanding a diamond centered at the house until it touches the road, as shown in Figure 1.5.1. The point where the square touches the road is the solution to the system x 2y 0, x 2, or px, yq p2, 1q. The minimum cost is then 4{3 4{3 2. In this case, the projection theorem with the ordinary concept of “perpendicular” does not apply, but we could have used [1, Theorem 5.8.1, p. 119]. In the third problem, we are minimizing a type of M INKOWSKI FUNCTIONAL (see [1, p. 131]). We can solve the problem geometrically by expanding a square around the house until it touches the road, as we did before. The factor by which you must expand the unit square in order for it make it touch a point, or 4{3, is the minimum cost. q Example 1.5.2 (minimum norms and convex sets). Assume that the Island of Sodor is a convex set (it actually is not – see Figure 1.5.2) whose shore is a smooth curve. Reverend Awdry is offshore and want to swim to the closest point on the island.

Figure 1.5.2: The Island of Sodor is actually not a convex set (Example 1.5.2)! Solution: We solve the problem geometrically by expanding a circle centered at Reverend Awdry’s location until it just touches the island. Let the radius of the circle be d and the point of tangency between the circle and the convex set be pt . Consider any other tangent to the circle, where the point of tangency is very close to pt : this tangent will cut across the interior of the island, otherwise either it would not be a tangent or the set would not be convex. Therefore, d is the minimum distance from Reverend Awdry’s location to the shore. Now consider all hyperplanes (lines) that separate Reverend Awdry from every point on the island. The reverend is farthest away from the hyperplane tangent to the island that we just constructed. q This is a DUALITY theorem: the distance d solves a minimization problem over points on the island and also solves a maximization problem over elements of the dual space. The theorem is proved in [1] (Theorem 1, p. 136) with no restrictions except convexity. That is: • The shore of the island need not be smooth.

16 U


• The vector space can be infinite-dimensional. • The norm need not be the Euclidean norm. An important class of minimum norm problems is finding a polynomial that best approximates a function in some interval. Before taking a brief look at approximation problems, we first define a LINEAR FUNCTIONAL : Definition 1.5.3 (linear functional). Let V be a vector space over a field F. A linear functional is a map f : V Ñ F such that f pv wq f pvq f pwq for all v, w P V and f pavq af pvq for all v P V and all a P F. Example 1.5.4. Consider the infinite-dimensional vector space V of all continuous functions f on r1, 1s. Let W V be the 4-dimensional subspace of polynomials p for which deg p ¤ 3. Which of the following are linear functionals? • L : f pxq Ñ f p1q • L : f pxq Ñ f p0q • L : f pxq Ñ f p1q • L : f px q Ñ

³1

1 f pxq dx

Solution: The first three are linear functionals since L pf

g q pf

g qpxq f pxq

g pxq Lpf q

Lpaf q af pxq aLpf q

for all x P R. The last is also a linear functional since L pf

gq

»1

1

pf p x q

Lpaf q

g pxqq dx »1

1

»1

1

f pxq dx

a f pxq dx a

»1

1

»1

1

Lpg q

g pxq dx Lpf q

Lpg q

f pxq dx aLpf q.

Note that these four functionals acting on V are clearly linearly independent, as we cannot write one as a linear combination of the others for every continuous function f on r1, 1s. If we restrict ourselves to W ° and write f pxq 3k0 ak xk , then f p1q a0 a1 f p0q a0

»1

1

f p1q a0

f pxq dx a0 x

a1

a2 a3 a2 a1 2 x 2

a3 a2 3 x 3

a3 4 1 x 4 1

2a0

2 a2 . 3


Since

1 1 det 1 2

1

1

1 0 1

0 1 0

17

0 1 0

2 3

0,

the functionals are linearly dependent. More elegantly, we might have remembered that Simpson’s rule is exact for polynomials p for which deg p ¤ 3, so we can write »1

1

f pxq dx

1 pf p1q 3

4f p0q

f p1qq .

q

Example 1.5.5. You have lost your scientific calculator and can only evaluate polynomials ppxq. You need to compute approximate values of f pxq sin x for randomly chosen values x P r0, 1s. Here are three norms that might be used to choose the best approximating polynomial:

• •

³1

|f pxq ppxq| dx L2 -norm: 0 pf pxq ppxqq2 dx L8 -norm: max0¤x¤1 |f pxq ppxq|

• L1 -norm:

0

³1

For which of these norms is the Taylor polynomial the right choice for ppxq? Under what circumstances might one or another of these norms be the appropriate choice? Solution: The Taylor polynomial is actually not the correct choice for any of these norms. It turns out that interpolating polynomials arise in minimizing the 1-norm, Fourier series in minimizing the 2-norm, and C HEBYSHEV POLYNOMIALS in minimizing the 8-norm. See [3] for a complete textbook on approximation theory. q The following theorem provides reassurance that under reasonably broad conditions, functions can be approximated by polynomials to arbitrary accuracy. T HEOREM 1.5.6 (Weierstraß Approximation Theorem). If f is a continuous real-valued function on r0, 1s, then there exists a polynomial p on ra, bs such that |f pxq ppxq| ¤ for all x P ras and any ¡ 0. Proof: Define the n

1 Bernstein basis polynomials of degree n as bk,n pxq

n k x p1 xqnk , k

k

0, . . . , n.

Then a linear combination of Bernstein basis polynomials B pX q

n ¸

βv bk,n pxq

k 0

is called a Bernstein polynomial, and β1 , . . . βn are called the Bernstein or Bézier coefficients. The proof of the Weierstraß approximation theorem is constructive: we will construct a series of Bernstein polynomials that converges uniformly to f .

18 U


Define the Bernstein polynomial Bn pf qpxq

n ¸

n k x p1 xqnk , k

k n

f

k 0

x P r0, 1s.

f is continuous and r0, 1s is compact, so f is uniformly continuous ([2, Theorem 4.19]) on r0, 1s. Therefore, given any ¡ 0, there exists a δ ¡ 0 such that |x y | δ implies |f pxq f py q| {2. Because f is continuous and r0, 1s is compact, we can also conclude that f is bounded ([2, Theorem 4.15]); say supxPr0,1s |f pxq| ¤ M . Now f pxq Bn pf qpxq

n ¸

f px q f

k n

k 0

n k x p1 xqnk , k

so,

|f pxq Bnpf qpxq| ¤

n ¸ k n k nk f x f k x 1 x n k 0

¸ k n k nk f x f k x 1 x n k PS loooooooooooooooooooooooomoooooooooooooooooooooooon

¸ k n k nk f x , f k x 1 x n k P T loooooooooooooooooooooooomoooooooooooooooooooooooon

A

B

p q

p q

p q

p q

p q

p q

where S tj : |x j {n| ¥ n1{4 , j P r0, nsu and T tj : |x j {n| n1{4 , j P r0, nsu r0, nszS. Let X Binompn, pq, so that P pX k q nk xk p1 xqnk and σ 2 var X {n pp1 pq{n ¤ 1{4n. By Chebyshev’s inequality, A

M P X

k ¥ n1{4 n

¤ M nσ1{2 M xp1?n xq ¤ M 4?1 n . 2

Equivalently, A {2 if n ¡ M 2 {42 . Furthermore, if |X k {n| n1{4 and n1{4 δ, then |X k {n| δ and |f pxq Bn pf qpxq| {2 since f is uniformly continuous on r0, 1s. Therefore, B is trivially less than {2. We conclude that if n ¡ maxpM 2 {42 , δ 4 q, then |f pxq Bn pf qpxq| A B ¤ for all x P r0, 1s; that is, the Bernstein polynomials Bn pf q converge uniformly to f . q Example 1.5.7 (Gibbs Phenomenon). Find a discontinuous real-valued function for which such a polynomial does not exist. Solution (based on [4]): Consider the function f pxq sgn sin x, where sgn x Clearly f is discontinuous at x kπ, for k

$ ' ' &1

0

' ' % 1

x¡0

x0.

x 0

P Z. Consider the partial Fourier series:

SN f p xq

N 4 ¸ sinpp2k 1qxq . π k 1 2k 1


Figure 1.5.3: Notice how limN Ñ8 SN f pxq f pxq at x nomenon or ringing artifacts (Example 1.5.7).

0. This discrepancy is known as Gibbs phe-

By the Weierstraß approximation theorem, we have limN Ñ8 |f pxq SN f pxq| some closed interval not containing a multiple of π.

¤ for all x where x is in

However, we notice in Figure 1.5.3 that SN f pxq exhibits a bump whenever x SN f pxq reaches a critical point whenever

pSN f q1pxq π4

N ¸

19

kπ. In particular,

cospp2k 1qxq 0.

k 1

To find the zeros of pSN f q1 pxq, we first show that 2 sin x

N ¸

cospp2k 1qxq sinp2N xq.

(1.5.1)

k 1

The proof is by induction. For N 1, Equation (1.5.1) reduces to the identity sinp2xq 2 sin x cos x. Now assume Equation (1.5.1) is true for general N and show that this implies it is true for N 1. We have 2 sin x

N¸1

k 1

cospp2k 1qxq 2 sin x

paq

N ¸

cospp2k 1qxq

cospp2N

1qxq

k 1

sinp2N xq 2 sin x cospp2N 1qxq sinp2N xq 2 sin x cosp2N xq cos x 2 sin x sinp2N xq sin x pbq p1 2 sin2 xq sinp2N xq sinp2xq cosp2N xq p cq cosp2xq sinp2N xq sinp2xq cosp2N xq pdq sinpp2N 1qxq.

where (a) follows from the inductive step, (b) follows from the identity sinp2xq 2 sin x cos x, (c) follows from the identity cosp2xq cos2 x sin2 x p1 sin2 xq sin2 x 1 2 sin2 x, and (d) follows from the identity sinpa bq cos a sin b sin a cos b. Using this fact, we see that SN f reaches a critical point whenever π sin xpSN f q1 pxq 2 sinp2N xq 0, or whenever x is a multiple of π {2N . Consider the closest critical points to x 0: x π {2N . Then we

20 U


have SN f

π

2N

N 1 π 4 ¸ sin 2k2N π k1 2k 1

N 1 π π 2 ¸ sin 2k2N . 2k1 π k 1 N 2N π

!

2k 2pk 1q The last sum is a Riemann sum taken over midpoints of the partition ,k 2N , 2N Therefore, »π π sin x lim SN f lim f pxq dx 1.1790. N Ñ8 2N x xÑ 0 0

)

0, . . . , N 1

.

A similar analysis limxÑ0 f pxq 1.1790. Therefore, the sequence of polynomials tSN f u does not converge pointwise to f . However, we do see that the sequence tSN f u does converge to f for the L1 - and L2 -norms, though it does not for the L8 -norm. q

C HAPTER 2 PRELIMINARIES IN ALGEBRA , TOPOLOGY, AND ANALYSIS

Reading: [1, Sections 2.1 – 2.9]. Also see [5, Chapters 1 – 2] for more on the linear algebra covered in this chapter, [6, Chapter 2] for more on the topology, and [2, Chapters 2 and 4] for more on the topology and analysis.

2.1

Vector Spaces

Definition 2.1.1 (vector space). A vector space V over a field F is a set V along with two operations: addition, which associates with any two vectors u, v P V a vector u v P V , and scalar multiplication, which associates with any vector v P V and any scalar a P F a vector av P V . The following axioms hold: 1. commutative law for vector addition: u 2. associative law for vector addition: pu

vv vq

u for all u, v P V .

wu

pv

wq for all u, v, w P V .

3. existence of an additive identity: There exists a null vector 0 P V such that v 4. distributive law for vector addition: apu 5. distributive law for scalar addition: pa

vq au bqv av

0 v for all v P V .

av for all u, v P V and all a P F. bv for all v P V and all a, b P F.

6. associative law for scalar multiplication: pabqv apbvq for all v P V and all a, b P F. 7. existence of a multiplicative identity: 1v v for all v P V . Also, 0v 0 for all v P V .

Unless specified otherwise, we will assume we are working with real vector spaces (F R). Example 2.1.2. Note that [1] replaces a standard axiom, the existence of an additive inverse, with the axiom 0v 0 for all v P V . Prove the existence of an additive inverse from [1]’s axioms. Solution: We have

0v p1 p1qqv pbq 1v p1qv pcq v p1vq, where (a) follows from [1]’s axiom 0v 0 for all v P V , (b) follows from the distributive law for scalar 0

paq

addition, and (c) follows from the definition of a multiplicative identity. Therefore, we define the additive inverse v 1v. q

21

22 U

2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Example 2.1.3. Which axiom on the list can be proved from the others? Solution: We can prove the commutative law from the other axioms. We have 0

paq

0 pu v q pp1q 1qpu vq pbq p1qpu vq 1pu vq pcq p1qu p1qv 1u 1v pdq u v u v

where (a) follows from the fact that 0v 0, (b) follows from the distributive law for scalar addition, (c) follows from the distributive law for vector addition, and (d) follows from the existence of additive inverses, as shown above, and the definition of the multiplicative identity. Then we add v to the left of both sides to get u u 0 u u v u v v u v; repeating with v gives v u u v. q How do we know that we cannot prove another axiom from the remaining ones? We can try defining vector addition and scalar multiplication in different ways and seeing if the resulting operators still obey the axioms for a vector space. If not, then the axioms that are not satisfied are independent of the axioms that are. Example 2.1.4. Let V R2 . Define addition as usual, but define scalar multiplication by apx, y q for px, y q P V and a P R. Is V a vector space? If not, what axiom is not satisfied?

pax, 0q

Solution: A multiplicative identity does not exist in this vector space since there exists no a such that apx, y q px, y q for all px, y q. Therefore, the existence of a multiplicative identity is independent of the other axioms. q Example 2.1.5. For the NSA budget, let V R Ytsu. The symbol S denotes a secret amount. For elements in V other than s, addition and multiplication are defined as usual. However, s v v s s for all v P V and as s for all a P R. Is V a vector space? If not, what axiom is not satisfied? Solution: No, because the axiom 0v 0 is not satisfied. Clearly 0s 0 s.

q

Below are some potential vector spaces:

All 2 3 matrices with real entries: This is equivalent to R6 , which is a 6-dimensional vector space All infinite sequences: infinite-dimensional vector space

All bounded infinite sequences: infinite-dimensional vector space (since the sum of two bounded sequences is bounded and a bounded sequence times a constant is still bounded)

All infinite sequences that converge to zero: infinite-dimensional vector space

2.1. Vector Spaces U

23

All functions f : tA, B, C u Ñ R: 3-dimensional vector space All polynomials of any (finite) degree: infinite-dimensional vector space

All infinite sequences with only finitely many nonzero terms: infinite-dimensional vector space All infinite sequences for which the terms form a convergent series: infinite-dimensional vector space

All polynomials p with deg p degpf g q 3)

3:

not a vector space (consider f pxq

x3 and gpxq x3:

All power series: infinite-dimensional vector space All continuous functions on r0, 1s: infinite-dimensional vector space All linear functions f : R3

Ñ R: 3-dimensional vector space (since we have the basis tf p1, 0, 0q, f p0, 1, 0q, f p0, 0, 1qu

because f is linear) Note that we nested SUBSPACES inside each other: Definition 2.1.6 (vector subspace). A nonempty subset U of a vector space V over F is called a subspace of V if for any u, v P U and a, b P F, au bv P U . That is, U is a subspace of V if U is closed under vector addition and scalar multiplication. A subspace must be nonempty, so it must contain at least one element. By the axiom guaranteeing the existence of an additive identity, that one element must be 0. Therefore, every subspace contains 0. Proposition 2.1.7 ([1, Proposition 2.3.1, p. 15]). If U1 and U2 are subspaces of V , so is U1 X U2 . Proof: 0 P U1 , U2 since U1 and U2 are subspaces, so 0 u, v P U1 X U2 , then u, v P U1 and u, v P U2 . Then au au bv P U1 X U2 .

P U1 X U2. Therefore, U1 X U2 is nonempty. If bv P U1 , U2 since U1 and U2 are subspaces, so q

While U1 X U2 is a subspace if U1 and U2 are subspaces, it is not necessarily true that U1 Y U2 is a subspace. Consider any two (noncolinear) lines in R2 that pass through the origin. These lines are both subspaces, but the smallest subspace containing both of these subspaces is R2 . Definition 2.1.8 (sum of vector spaces). The sum of two subsets U1 and U2 of a vector space is the set of all sums u1 u2 , where u1 P U1 and u2 P U2 .

Example 2.1.9. What are the sum and difference of two squares, one centered at a and with side length 2r and the other centered at b and with side length 2s?

24 U


Solution: We skip ahead a little and assume knowledge of NORMED VECTOR SPACES. Let the first square be the set A tx : }x a}8 ¤ ru and B ty : }y b}8 ¤ su. Consider any x P A and y P B. We have

}px

y q pa

bq}8

¤ }x a}8 }y b}8 ¤ r

s,

where the first inequality follows from the TRIANGLE INEQUALITY. Therefore, A at a b with side length 2pr sq. Furthermore, we have

}px yq pa bq}8 ¤ }x a}8 }y b}8 ¤ r

B is a square centered

s,

where again we use the triangle inequality, as well as the property that }ax} |a|}x} for any scalar a P F and any vector x P X, where X is a normed vector space. Therefore, A B is a square centered at a b with side length 2pr sq. q

Note that A B does not depend on our choice of the origin o since pa oq pb oq a b. This construction will be crucial to the proof of the E IDELHEIT SEPARATION THEOREM ([1], Theorem 3, p. 133). Proposition 2.1.10 ([1, Proposition 2.3.2, p. 15]). If U1 and U2 are subspaces of V , then U1 V.

U2 is a subspace of

Proof: U1 and U2 contain 0, so U1 U2 contains 0. Suppose u1 , u2 P U1 U2 . Then there exist vectors v1 , v2 P U1 and w1 , w2 P U2 such that u1 v1 w1 and u2 v2 w2 . For any scalars a, b P F, we have au1

bu2

looooomooooon pav1 bv2q loooooomoooooon paw1 bw2q; PU1

PU2

since au1 bu2 can be expressed as the sum of a vector in U1 and a vector in U2 , it is in U1 U1 U2 is a subspace of V .

U2 . Therefore, q

Given any set S of vectors in V , there will in general be many subspaces of V that contain S. One is V itself, but there may be smaller ones. For example, if S tp1, 1, 0u, then some examples of subspaces that contain S are tpx, x, 0q : x P Ru, tpx, y, 0q : x, y P Ru, and tpx, y, z q : x, y, z P Ru, which are of dimensions 1, 2, and 3, respectively. The “smallest” such subspace is given a special name: Definition 2.1.11 (subspace generated by a subset). Suppose S is a subset of a vector space V . The set rS s, called the subspace generated by S, consists of all vectors in V which are linear combinations of vectors in S. If subspace U contains S, it contains rS s. Equivalently, rS s is the intersection of all subspaces that contain S. The usual way to construct rS s is to form all linear combinations of vectors in S. This is clearly a subspace since it is closed. Example 2.1.12. Construct rS s P R2 if • S

tp1, 2q, p3, 6qu

2.2. Convex Sets U

• S

25

tp1, 2q, p3, 3qu

• S is the line segment from p1, 2q to p3, 6q.

Solution: We have • rS s tpx, 2xq : x P Ru since p3, 6q is a multiple of p1, 2q. • rS s R2 since p1, 2q and p3, 3q are linearly independent. • rS s tpx, 2xq : x P Ru since the line segment is part of the line y

2x.

q

S UBSPACE generalizes “line or plane through the origin”. The generalization of “any line or plane” is LINEAR VARIETY , a subspace plus a constant vector: Definition 2.1.13 (linear variety). The translation of a subspace is a linear variety or affine subspace. Analogous to the subspace generated by a subset, we have the following: Definition 2.1.14 (linear variety generated by a subset). Suppose S is a nonempty subset of a vector space V . The set ν pS q, called the linear variety generated by S, is the intersection of all linear varieties in V that contain S. Example 2.1.15. If S P R3 consists of the vectors tp0, 0, 1q, p0, 1, 1qu, give examples of subspaces of two different dimensions that contain S. Which subspace is the “smallest”? What is ν pS q? Solution: The subspaces tp0, x, y q : x, y P Ru and tpx, y, z q : x, y, z P Ru are subspaces of dimensions 2 and 3, respectively, that contain S. The first subspace is the smallest. ν pS q is p0, 0, 1q tp0, x, 0q : x P Ru. q Example 2.1.16. In R3 , what is ν pS q if S is a circle? Under what circumstances does ν pS q rS s? Solution: ν pS q is the plane containing the circle. If the origin lies in the plane containing the circle, then ν pS q rS s. q

2.2

Convex Sets

Definition 2.2.1 (convex set). A set K in a linear vector space is said to be convex if for any x1 , x2 elements in the set tλx1 p1 λqx2 : λ P r0, 1su are in K.

P K, all

26 U


Proposition 2.2.2. The empty set is convex. Proof: To show that ∅ is not convex, we must find two vectors x1 , x2 P ∅ such that there exist vectors in the set tλx1 p1 λqx2 : 0 ¤ λ ¤ 1u not in ∅. Since ∅ contains no vectors at all, it is convex. q Proposition 2.2.3 ([1, Proposition 2.4.1, p. 18]). For any convex sets K, L and any scalars a, b, aK convex. Proof: Consider the vectors z1 , z2 P aK bL. There exist vectors x1 , x2 z1 ax1 by1 and z2 ax2 by2 . Since K and L are convex, we have λz1

p1 λqz2 λpax1

by1 q

p1 λqpax2

by2 q apλx1

bL is

P K and y1, y2 P L such that

p1 λqx2q

bpλy1

p 1 λ qy 2 q.

Then λx1 p1 λqx2 and λy1 p1 λqy2 are in K and L, respectively, since K and L are convex. Therefore, λz1 p1 λqz2 is in aK bL. q

Figure 2.2.1: The union of the two disks is not convex. In general, the union of two convex sets is not convex, as shown in Figure 2.2.1. However, we do have the following: Proposition 2.2.4 ([1, Proposition 2.4.2, p. 18]). Let C be an arbitrary collection of convex sets. Then XK PC K is convex. Proof: Let C XK PC . If C is empty, then the proof reduces to that in Example 2.2.2. Assume we have x1 , x2 P C and choose any λ P r0, 1s. Then x1 , x2 P K for all K P C and since each K is convex, λx1 p1 λqx2 P K for all K P C. Therefore, λx1 p1 λqx2 P C and C is convex. q Definition 2.2.5 (convex hull). Given an arbitrary set S in a linear vector space, the convex hull or convex cover, denoted copS q, is the smallest convex set containing S. We can express copS q as the intersection of all convex sets containing S. Alternatively, we could express copS q as the set of all convex combinations of vectors in S, where a convex combination is a linear combi°| S | °|S | nation k1 αk xk , where αk ¥ 0 for k 1, . . . , |S | and k1 αk 1.

2.2. Convex Sets U

27

Proposition 2.2.6. Let K be the set of vectors consisting of all convex combinations of vectors in S. Show that K copS q. °

Proof: First we show that K copS q. Let Km be the set of all convex combinations of the form m i1 αi xji , °m n where i1 αi 1, ai ¥ 0 for i 1, . . . , m, and m 1, . . . , n. Clearly K m1 Km . Then in order to show K copS q, we need to show that Km copS q for m 1, . . . , n. The proof is by induction. This clearly holds for m 1 since in that case α1 1 and the convex combinations are therefore just the elements of S copS q. Now we assume that the result is true for general m n and show that this ° ° implies it is true for m 1. Say we are given a convex combination p im1 1 αi xji , where ik11 αi 1 and ai ¥ 0 for i 1, . . . , m 1. At least one of α1 , . . . , αm 1 must be strictly positive; without loss of generality, assume that α1 ¡ 0. Then p

m ¸1

α i x ji

α1 xj

m ¸1

1

α i x ji

i 2

i 1

m ¸1

α1 looxmo j on 1

i 2

q

αi

m ¸1

αi

°m 1 x ji , i2 αi i 2 looooooooomooooooooon r

Clearly q

P copS q since q P S. Since m ¸1

i 2

αi

°m

1

αi

i 2

°m

1 i 2 αi °m 1 i 2 αi

1,

r is a convex combination of m elements of S and therefore, by the inductive hypothesis, is also in copS q. Then p is on the line segment connecting q and r with q, r P copS q, so p P copS q, as we wanted to show. Remember that copS q is the smallest convex set containing S. We know that S K1 K copS q, so we just need to show that K is actually convex in order to show that K copS q. Say we are given two ° ° ° ° elements of K, q ni1 αi xi and r ni1 βi xi , with ni1 αi ni1 βi 1 and αi , βi ¥ 0 for i 1, . . . , n. ° ° Then any point on the line segment connecting q and r can be written as p λ ni1 αi xi p1 λq ni1 βi xi ° with λ P r0, 1s. We can rewrite p as ni1 pλαi p1 λqβi qxi , with n ¸

pλαi p1 λqβiq λ

i 1

n ¸

i 1

αi

p1 λq

n ¸

βi

λ p1 λq 1,

i 1

so p is a convex combination of x1 , . . . , xn and is therefore in K.

q

Definition 2.2.7 (cone). A set C in a linear vector space is said to be a cone with vertex at the origin if x P C implies αx P C for all α ¥ 0.

Example 2.2.8. Make a cone from a line segment, from a circle, and from a disk. Solution: Figure 2.2.2 shows a cone made from the line segment connecting p5, 5q and p5, 10q and a cone made from the circle of radius 1{2 centered at p0, 0, 1{2q and parallel to the xy-plane. A cone made from the corresponding disk would be the same as the cone made from the circle, except that the cone made from the disk would be filled in. q

28 U


Figure 2.2.2: cones made from the line segment and the circle in Example 2.2.8

Example 2.2.9. Make a convex cone from a non-convex set. Solution: Consider the convex cone C generated from the line segment L in Figure 2.2.2. If we replace L with an arc A that has the same endpoints as L and that is entirely contained within C, then we still generate C. However, A is a non-convex set. q Proposition 2.2.10. It is impossible to make a non-convex cone from a convex set. Proof: Let C be a cone generated from a convex set K. Pick any two vectors y1 , y2 P C. Then there exist vectors x1 , x2 P K and scalars α1 , α2 ¥ 0 such that y1 α1 x1 and y2 α2 x2 . Furthermore, we can set α1 βγ and α2 β p1 γ q where β ¥ 0 and 0 ¤ γ ¤ 1: specifically, β α1 α2 and γ α1 {pα1 α2 q. Then we can write yc λy1 p1 λqy2 β pγx1 p1 γ qx2 q βxc ; xc

P K since K is convex and yc P C by definition of a cone. Therefore, C is convex. 2.3

q

Linear Independence and Dimension

Definition 2.3.1 (linear independence). A vector x is said to be linearly dependent upon a set of vectors S if x can be expressed as a linear combination of vectors in S. Equivalently, x is linearly dependent upon S if x P rS s. Conversely, x is said to be linearly independent of the set S if it is not linearly dependent on S, and a set of vectors is said to be a linearly independent set if each vector in the set is linearly independent of the remainder of the set. Note that this definition works even for infinite-dimensional vector spaces. Example 2.3.2. Let S t1, t2 , t4 , . . . u. Is f ptq pt8 4t2 qpt6 dependent upon S? Is S a linearly independent set?

t2

7q dependent upon S? Is g ptq

t5

2.3. Linear Independence and Dimension U

29

Solution: f ptq is dependent upon S since each term of f ptq has an even exponent. g ptq is independent of S since t5 has an odd exponent. S is a linearly independent set. q T HEOREM 2.3.3 ([1, Theorem 2.5.1, p. 20]). The set tx1 , . . . , xn u is linearly independent if and only if k1 αk xk 0 implies αk 0 for k 1, . . . , n.

°n

Proof: See the proof in [1].

q

T HEOREM 2.3.4 ([1, Corollary 2.5.1, p. 20]). If tx1 , . . . , xn u is a linearly independent set and °n k1 βk xk , then αk βk for k 1, . . . , n. Proof: If 2.3.3.

°n

k 1 α k xk

°n

k 1 αk x k

°nk1 βk xk , then °nk1pαk βk qxk 0 and αk βk for k 1, . . . , n by Theorem q

Definition 2.3.5 (basis). A finite set B of linearly independent vectors is said to be a basis for the space V if rB s V . The number of vectors |B | in B is called the dimension of V . Example 2.3.6. What is the dimension of the space spanned by t1, cos2 x, sin2 x, cos 2xu? Solution: This space has dimension 2, since we can write 1 sin2 x.

cos2 x

sin2 x and cos 2x

cos2 x

q

Q UIZ T HEOREM 1 (from [1, Theorem 2.5.2, p. 21]). If a vector space V is generated by the set of k vectors Sk tv1 , . . . , vk u, then any set of k 1 vectors Tk 1 tw1 , . . . , wk 1 u in V must be linearly dependent. Proof: The proof is by induction. For the base case k 1, S1 tv1 u and T2 tw1 , w2 u. There exist scalars α1 , α2 0 such that w1 α1 v1 and w2 α2 v1 . Then we have α2 v1 α1 v2 0, so T is a linearly dependent set. Our inductive hypothesis is that if V rSk1 s, then Tk is a linearly dependent set. We assume this is ° true for arbitrary k and show that it holds for k 1. First, we can write wk 1 ki1 αi vi since tv1 , . . . , vk u is a basis. Choose some j such that αj 0. If this cannot be done, then w1 0 and Tk 1 is a linearly dependent set. Then let Sk1 Sk tvj u. Now vj

α1 wk

1

v

j

for some v P rSk1 s, with v

°k

bi vi . Substituting this into the expression below, we have

i 1

i j

wm

k ¸

i 1

β i vi

k ¸

pβ i

i 1

i j

bi qvi

wk

1

30 U


for m 1, . . . , k. By the inductive hypothesis, these k vectors w1 , . . . , wk are linearly dependent; that is, ° there exists some linear combination ki1 γi wi 0 with not all of γ1 , . . . , γk equal to 0. But then we can ° set γk 1 0 so that the linear combination kk1 γi wi also equals 0; again not all of γ1 , . . . , γk equal 0 so the set Tk 1 is linearly dependent, thus proving the theorem. q We conclude that if Sk is a basis for V , then any set of more than k vectors in V is linearly dependent and cannot be a basis. Therefore, any two bases for a finite-dimensional vector space contain the same number of elements. Example 2.3.7. Show that the result does not extend to infinite-dimensional vector spaces. Solution: The vector space V of convergent sequences is generated by the countably infinite set of linearly independent elements S tp1, 0, . . . q, p1, 1, 0, . . . q, p1, 1, 1, 0, . . . q, . . . u. If we remove the first element of this set, we still have a countably infinite set S 1 S tp1, 0, . . . qu, but this does not generate V since not all convergent sequences tak u8 q k1 have a1 a2 .

2.4

Normed Vector Spaces

Definition 2.4.1 (normed vector space). A normed vector space, or normed linear space or normed linear vector space, is a vector space X on which there is defined a real-valued function which maps every x P X into a real number }x} called the norm of x. The norm satisfies the following axioms: 1. positive homogeneity: }αx} |α|}x} for all α P R and all x P X. 2. triangle inequality: }x 3.

y} ¤ }x}

}y} for all x, y P X. positivity and positive definiteness: }x} ¥ 0 for all x P X and }x} 0 if and only if x 0.

A real-valued function p : X Ñ R that satisfies just the first two axioms is called a SEMINORM. Note the following about each axiom: positive homogeneity • If you take a cab from your destination back to your starting point, you should pay the same amount as when you went from your starting point to your destination. • You cannot save money by breaking a cab ride along a single straight road into two pieces. • The cab driver cannot offer bargain fares for extra-long rides. • In order to convert x41 dimensions: }x}4

x41

x42 into a norm, we must take the fourth root (this is the l4 -norm in 2 x42

1 4

).

• If you draw a convex set that contains the origin and decree that the cab fare from the origin to any point on the boundary of this set is $1 and that fare scales linearly with distance, you do not necessarily have a norm. In fact, you only have a norm if the boundary is the set tx : }x} cu for some constant c.

2.4. Normed Vector Spaces U

31

triangle inequality • You cannot save money by breaking a cab ride into two pieces.

• A cab driver can offer a 50% discount on the shorter dimension (}x pu, v q} 12 minp|u|, |v |q maxp|u|, |v |q). Say we have x pa, bq and y pc, dq. Assume without loss of generality that minp|a|, |b|q |a|. If minp|c|, |d|q |c|, then clearly }x y} ¤ }x} }y}. If minp|c|, |d|q |d|, assume without loss of generality that minp|a c|, |b d|q |a c|. Then

}x} }y} }x

y}

1 |a| 2

|b|

12 p|a| |c| |a

|c|

1 |d| 2

1 |a 2

c|q

p|b| |d| |b

c| d|q

|b

d|

1 p|c| |d|q ¥ 0, 2

where the inequality follows since each term in parentheses is nonnegative. Similarly, a cab driver can also offer a 50% discount on the longer dimension.

• The set of vectors tx : }x} ¤ 1u is convex since

}λx1 p1 λqx2} ¤ λ}x1} p1 λq}x2} ¤ λ p1 λq 1. positivity and positive definiteness • The “if” part of positive definiteness is not an independent requirement: positive homogeneity implies that }x} 0 if x 0 since we have }0} }00} |0|}0} 0. • Positivity is not an independent requirement: we have 0

paq

pbq

}0} }x x} ¤ }x} } x} pcq }x} | 1|}x} 2}x},

where (a) follows from positive homogeneity as shown above, (b) follows from the triangle inequality, and (c) follows from positive homogeneity. Dividing both sides by 2 proves positivity. • If the cab driver makes the longer dimension free, cab fare is still a norm (the l8 -norm). In R2 , we have seen the l1 -, l2 -, l4 -, and l8 -norms so far, defined by |x1 |

1

1

|x2|, x21 x22 , x41 x42 , and maxp|x1 |, |x2 |q, respectively. Before we are tempted to conclude that }x}p pxp1 xp2 q is a norm for all p ¥ 0, note that it is actually not a norm for p P r0, 1q. The “l0 -norm” does not satisfy positive homogeneity, and the “lp -norm” for p P p0, 1q does not satisfy the triangle inequality (the unit disks for 2

4

1 p

these norms are not convex). Example 2.4.2. Consider the set of infinite sequences of real numbers that have only finitely many nonzero terms. Is this a vector space? If so, what is its dimension? Generalize the norm examples to this case. If you remove the restriction to finitely many nonzero terms but insist that the norm be finite, which example gives the largest vector space? The smallest? Solution: Yes, this is a vector space since the sum of an infinite sequence with M ¤ 8 nonzero terms and an infinite sequence with N ¤ 8 nonzero terms will have at most M N ¤ 8 nonzero terms, and any multiple of an infinite sequence with M nonzero terms will still have only M nonzero terms. The vector

32 U

2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS 1

°

8 |ξ |p p . The space is infinite-dimensional. The lp -norm (p P r1, 8q) for a sequence tξ1 , ξ2 , . . . u is i1 i l8 -norm is maxi t|ξi |u8 . If we remove the restriction to finitely many nonzero terms but insist that the i1 norm be finite, then the l8 -norm gives the largest vector space, while the l1 norm gives the smallest. q Example 2.4.3. C ra, bs is the space of continuous functions on ra, bs, with norm }x} Confirm that this norm satisfies all the axioms.

maxtPra,bs |xptq|.

Solution: We have 1. positive homogeneity:

}αx} amax |αxptq| |α| amax |xptq| |α|}x}. ¤t¤b ¤t¤b 2. triangle inequality:

}x

y } max |xptq

¤¤

a t b

y ptq| ¤ max p|xptq|

¤¤

a t b

|yptq|q ¤ amax |x| ¤t¤b

max |y ptq| }x}

¤¤

a t b

}y}.

3. positivity and positive definiteness: Clearly }x} 0 if only if xptq 0 for all t P ra, bs. }x} ¥ 0 since |x| ¥ 0. q Example 2.4.4. Give examples of functions, not in C r0, 1s, that can be included in the space if you choose ³1 the norm }x} 0 |xptq| dt.

Figure 2.4.1: Thomae’s function (Example 2.4.4). Image taken from Wikipedia. Solution: The function xptq sgn px 0.5q is discontinuous at the point x 0.5, but }x} 1. A more interesting choice is Thomae’s function, also known as the popcorn function or the Riemann function: xptq

#

1 q

0

t

PQ. tPRQ p q

We show that }x} 0 (see Figure 2.4.1). Given any ¡ 0, choose n such that 1{n . In the interval r0, 1s, there are only finitely many rational numbers with denominator at most n. Say there are dn of these numbers. Surround each of these dn points with an interval of length {dn , and use the endpoints of these intervals to form a dissection of the interval r0, 1s. The upper Darboux sum for this dissection is less than 2: within each of the dn intervals, whose combined length is , the function has an upper bound of 1, and

2.4. Normed Vector Spaces U

33

within each of the remaining intervals, whose combined length is less than 1, the function has an upper bound of 1{n . The lower Darboux sum is obviously 0 since every interval in the dissection contains an irrational number. Since we have 0 ¤ }x} ¤ 2 for all ¡ 0, }x} 0. The norm }x} does not exist for the Dirichlet function, defined as x pt q 1 Q pt q

#

1 0

tPQ

tPRQ

,

is not Darboux-integrable (and therefore not Riemann-integrable) since every interval in any dissection of r0, 1s will always contain at least one rational and at least on irrational number; therefore, the upper and lower Darboux sums will not converge. If we use the Lebesgue integral, however, then }x} exists and equals 0 since Q is countable. q Definition 2.4.5 (total variation). A function x on ra, bs is said to be of bounded variation if there is a constant K so that for any partition a t0 t1 tn b of ra, bs, n ¸

|xptiq xpti1q| ¤ K.

i 1

We then define the total variation of x as TVpxq sup

n ¸

|xptiq xpti1q|

»b

i 1

a

|dxptq|,

where the supremum is taken with respect to all partitions of ra, bs. Example 2.4.6. The space BVra, bs is defined as the space of all functions of bounded variation on ra, bs together with the norm }x} |xpaq| TVpxq. Why is this norm the appropriate choice? Is the function 1 xptq sin 1 t in BVr0, 1s?

Figure 2.4.2: The function xptq sin 1t does not have bounded variation (Example 2.4.6). Solution: We check that }x} satisfies the three axioms for a norm:

34 U


1. positive homogeneity:

}αx} |αxpaq|

sup

n ¸

|αxptiq αxpti1q| |α||xpaq| |α| sup

n ¸

|xptiq xpti1q| |α|}x}.

i 1

i 1

2. triangle inequality:

}x

y } |xpaq

y paq|

sup

n ¸

|pxptiq

y pti qq pxpti1 q

y pti1 qq|

i 1

¤ |xpaq| |ypaq|

sup

n ¸

|xptiq xpti1q|

i 1

sup

n ¸

|yptiq ypti1q|,

i 1

where we use the triangle inequality for the absolute value function. 3. positivity and positive definiteness: Clearly }x} 0 if xptq 0 for t P ra, bs. Furthermore, because we include the term |xpaq| in the norm, }x} 0 only if xptq 0 for t P ra, bs. Finally, }x} ¥ 0 for all functions x P BVra, bs since every term in the norm involves absolute values. The function xptq sin 1t is not in BVr0, 1s since there are infinitely many points t P r0, 1s at which xptq 1. Specifically, as t Ñ 0, xptq oscillates between 1 and 1 infinitely often, as shown in Figure 2.4.2. Therefore, xptq does not have bounded variation. q

We can think of xptq as the position of a car as a function of time for t P ra, bs, and that the car’s odometer increases even when the car is backing up. Then the space BVra, bs is the set of functions for which the change in the odometer reading is finite. This space is important because it will turn out to be the dual space to C ra, bs.

2.5

Open and Closed Sets

A norm introduces a topology that may be more general than what the reader is used to. Definition 2.5.1 (topology). A topology on a set X is a collection T of subsets of X having the following properties: 1. ∅ and X are in T . 2. The union of the elements of any subcollection of T is in T . 3. The intersection of the elements of any finite subcollection of T is in T . A set X for which a topology T has been specified is called a topological space. Definition 2.5.2 (open set). If X is a topological space with topology T , we say that a subset U X is an open set of X if U belongs to the collection T . More generally, a topological space is a set X together with a collection of subsets of X, called open sets, such that ∅ and X are both open, and such that arbitrary unions and finite intersections of open sets are open.

2.5. Open and Closed Sets U

35

The most basic topology on a nonempty set X is the collection t∅, X u, so a topology can contain just two open sets. Furthermore, it is permissible to have every set in a topology be open. Here are some other definitions that are independent of any model for topology: Definition 2.5.3 (closed set). A subset A of a topological space X is said to be closed if the set X open.

A is

Definition 2.5.4 (interior and closure). Given a subset A of a topological space X, the interior of A, de˚ is defined as the union of all open sets contained in A (i. e. the largest open subset of A). The noted A, ¯ is defined as the intersection of all closed sets containing A (i. e. the smallest closure of A, denoted A, closed set containing the subset A). ˚ then A is open, while if A A, ¯ then A is closed. Clearly if A A,

Figure 2.5.1: a Web site model that illustrates a finite topology that does not involve norms We use a Web site model to illustrate a finite topology that does not involve norms. Figure 2.5.1 shows the links between the six Web pages in a set X. An “open set” is defined by the property that no page in the set can be reached from a page from outside the set. Example 2.5.5. For the Web site example, that both ∅ and X are open. Solution: We cannot find a page in ∅ that can be reached from a page outside ∅ because there are no pages in ∅. We cannot find a page outside X that links to a page in X because there are no pages outside X. Therefore, ∅ and X are open. q Example 2.5.6. For the Web site example, prove that the union of two open sets is open. Solution: Let A and B be two open sets. By the definition of an open set, there are no links from pages in X zA to pages in A, nor are there links from pages in X zB to pages in B. Since X zpA Y B q X zA and X zpA Y B q X zB, there are no links from pages in X zpA Y B q to either pages in A or pages in B; therefore, there are no links from pages in X zpA Y B q to pages in A Y B. Thus, A Y B is open. q

36 U


Example 2.5.7. For the Web site example, prove that the intersection of two open sets is open. Solution: Let A and B be two open sets. Suppose that A X B is not open. Then there must be some page in X zpA X B q that links to a page in A X B. But X zpA X B q pX zAq Y pX zB q, so we have found either a page in X zA that links to a page in A or a page in X zB that links to a page in B. This contradicts our assumption that A and B are open, so A X B is open. q For the Web site model, we note the following to illustrate some of the topological concepts we have defined: • The nine open sets in X are ∅, t2u, t4, 5u, t1, 2, 3u, t2, 4, 5u, t4, 5, 6u, t2, 4, 5, 6u, t1, 2, 3, 4, 5u, and t1, 2, 3, 4, 5, 6u X. • The interior of t2, 3u is t2u since t2u is the largest open subset of t2, 3u. • The interior of t1, 2, 4, 5, 6u is t2, 4, 5, 6u. • The nine closed sets are the complements of the nine open sets in X. Note that ∅ and X are both open and closed sets (sometimes called clopen sets). • The closure of t1u is t1, 3u since t1, 3u is the smallest closed set containing the subset t1u. • The closure of t1, 6u is t1, 3, 6u since t1, 3, 6u is the smallest closed set containing the subset t1, 6u. Now we define a topology using a norm. Definition 2.5.8 (interior point). Let P be a subset of a normed vector space X. The point p P P is an interior point of P if there exists some ¡ 0 such that the ball B pp, qq tx : }x p} u is a subset of P. Definition 2.5.9 (clsoure point). A point x P X is a closure point or limit point of a set P if, given any ¡ 0, there is a point p P P such that }x p} . In other words, these definitions are saying that a point p is an interior point of P if we can always surround p with a ball entirely contained within P , while a point x is a closure point of P if every ball centered at x contains at least one point in P . Proposition 2.5.10. P¯ is a closed set; that is, that the complement of P¯ is an open set. (based on [2, Theorem 2.27, p. 35]): If p P X and p R P¯ , then p is neither a point in P nor a closure point of P . Therefore, there exists some ball centered at p that does not contain any points in P . This shows that the complement of P¯ in X is open, so P¯ is closed. q

2.6. Convergence, Limits, and Continuity U

37

˚ Q UIZ T HEOREM 2 (first part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then C is convex. ˚ is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C. ˚ Fix λ P p0, 1q: we must Proof: If C ˚ ˚ show that z λx0 p1 λqy0 is in C. Since x, y P C, there exists some ¡ 0 such that the open balls B px0 , q, B py0 , q are contained in C; that is, all vectors x0 w and y0 w with }w} are in C. Since C is convex, all convex combinations λpx0 wq p1 λqpy0 wq are in C. Furthermore, since λpx0 wq p1 λqpy0 wq z0 w, it follows that all points of the form z0 w are in C; that is, there ˚ exists some ¡ 0 such that the open ball B pz0 , q is contained in C. Therefore, z0 P C. q

Q UIZ T HEOREM 3 (second part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then C¯ is convex. ¯ Fix λ P p0, 1q: we must Proof: If C¯ is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C. ¯ show that z0 λx0 p1 λqy0 is in C. Given any ¡ 0, select x, y from C such that }x x0 } and }y y0} . Since C is convex, z λx p1 λqy is in C. Then by the triangle inequality,

}z z0} }λx p1 λqy λx0 p1 λqy0} ¤ λ}x x0} p1 λq}y y0} λ p1 λq , so z0 is within a distance of the point z ¯ and is therefore in C.

2.6

P C. Since this is true for every ¡ 0, z0 is a closure point of C¯ q

Convergence, Limits, and Continuity

Any topology is sufficient to define convergence. Definition 2.6.1 (convergence (any topology)). The sequence txn u converges to x if, for every open set P containing x, there exists an N such that for all n ¡ N , xn P P . We write xn Ñ x.

Example 2.6.2. For the Web site topology, show that the sequence 6, 5, 4, 6, 5, 4, 5, 4, 5, 4, . . . converges both to 4 and to 5. Solution: The smallest open set containing 4 is t4, 5u, and the smallest open set containing 5 is also t4, 5u. Therefore, for every open set P containing 4 or 5, there exists some N such that for all n ¡ N , q If we define open sets by a norm, we can formulate convergence in terms of norms. Definition 2.6.3 (convergence (normed space)). The sequence txn u converges to x if, for every ¡ 0, there exists an N such that for all n ¡ N , }xn x} . As before, we write xn Ñ x.

38 U


Q UIZ T HEOREM 4 ([1, Proposition 2.8.1, p. 27]). If a sequence converges, its limit is unique. Proof: Suppose xn Ñ x and xn Ñ x1 . Then for every {2 ¡ 0, there exist N, N 1 such that for all n and all n1 ¡ N 1 , }xn x} {2 and }xn x1 } {2. Then by the triangle inequality,

¡N

}x x1} }px xmq px1 xmq} ¤ }x xm} }x1 xm} ¤ for m ¡ maxpN, N 1 q. Since this is true for any ¡ 0, x x1 .

q

T HEOREM 2.6.4 ([1, Proposition 2.8.2, p. 27]). A set F is closed if and only if every convergent sequence with elements in F has its limit in F . Proof: For the “only if” direction, the limit of a sequence in F is obviously a closure point of F and therefore must be in F if F is closed. For the “if” direction, suppose that F is not closed. Then there is a closure point x of F that is not in F . In each of the open balls B px, 1{nq we may select a point xn P F since x is a closure point. The sequence txn u generated in this way converges to some x R F , which contradicts our assumption that every convergent sequence with elements in F has its limit in F . Therefore, F must be closed. q Definition 2.6.5 (transformation). Let X and Y be vector spaces and let D be a subset of X. A rule which associates an element y P Y with every element x P D is a transformation from X to Y with domain D. Definition 2.6.6 (injective). T is injective or one-to-one if T pxq T pyq implies x y. Definition 2.6.7 (surjective). T is surjective or onto if for every y P Y , there exists at least one x such that T pxq y. In other words, the image of T equals its codomain. Definition 2.6.8 (linear). T is linear if T pax byq aT pxq bT pyq for any x, y P X and any scalars a, b P F.

Example 2.6.9. If X or Y is infinite-dimensional, then a linear transformation T cannot be represented by a matrix. Show that the following transformations still qualify as linear: • Both X and Y are the space of polynomial functions: T is differentiation. • Both X and Y are the space of continuous functions on ra, bs: T is defined by T px q

»b a

k pt, τ qxpτ q dτ.

where k is a continuous function on ra, bs ra, bs

2.6. Convergence, Limits, and Continuity U Solution: Differentiation and integration are linear operators: Dx paf bg q aDx f bDx g and ³ ³ bg pxqq dx a R f pxq dx b R g pxq dx. Definition 2.6.10 (continuity (any topology)). T : X of any open subset V Y is an open subset of X.

ÑY

is continuous if the inverse image U

³ R

39

paf pxq q

T 1 pV q

Notice that the inverse image is defined even if T is not invertible, and it is not necessarily a connected set. For a topology defined by a norm, we can formulate convergence in terms of norms. Definition 2.6.11 (continuity (normed space)). A transformation T from a normed space X to a normed space Y is continuous at x0 P X if for every ¡ 0, there exists a δ ¡ 0 such that }x x0 } δ implies }T pxq T px0q} .

Example 2.6.12 (based on [6, Theorem 18.1, p. 104]). Show that this is the same definition, specialized to a topology defined by a norm. In other words, show that the definition of continuity for any topology implies the definition of continuity for a normed space. Solution: Suppose T is continuous at x0 and let V ty : }T px0 q y} u be an open subset of Y . Then there exists an open set U 1 T 1 pV q in X, and clearly, x0 P U . Furthermore, since U 1 is open, there exists some δ such that U tx : }x0 x} δ u is a subset of U 1 . Then x P U implies T pxq P V . q Note that the norm can be different in X and Y . T HEOREM 2.6.13 ([1, Proposition 2.9.1, p. 28]). A transformation T mapping a normed space X into a normed space Y is continuous at the point x0 P X if and only if xn Ñ x0 implies T pxn q Ñ T pbx0 q. Proof: The “if” portion is obvious since we just set the δ in the definition of continuity to the in the definition of convergence. For the “only if” portion, let txn u be a sequence such that xn Ñ x0 but T pxn q Û T px0 q. Then for some ¡ 0 and every N , there is an n ¡ N such that }T pxn q T px0 q} ¥ . Since xn Ñ x0 , this implies that for every δ ¡ 0 there is a point xN such that }xn x0 } δ } and }T pxn q T px0 q} ¥ . But this contradicts our assumption that T is continuous at x0 , so we must have T pxn q Ñ T px0 q. q Theorem 2.6.13 suggests the following practical test for lack of continuity. If you want to show that T is not continuous at x0 : • Choose some suitable fixed (smaller works better) • Construct a sequence xn with the following properties: – xn converges to x.

– For all n ¡ N , ||T pxn q T pxq|| ¥ .

Having xn scale like

1 n

is often convenient.

40 U


Example 2.6.14. Use this test to show that the function f px, y q is discontinuous at px, y q in R2 .

x2 y , x4 y 2

f p0, 0q 0

p0, 0q and that the proof is valid for the l2 norm, the l1 norm, or the l8 norm

Solution: Let pxn , yn q p1{n, 1{n2 q. Then lim f pxn , yn q lim

Ñ8

n

Ñ8

n

so f is discontinuous at px, y q p0, 0q.

1 n4 1 n4

1 n4

12 0 f p0, 0q f

lim xn , lim yn ,

n

Ñ8

Ñ8

n

q

C HAPTER 3 BANACH SPACES

Reading: [1, Sections 2.10 – 2.15]

3.1 lp Space Definition 3.1.1 (lp space). For p which

¥ 1, the space lp consists of all sequences of scalars tξ1, ξ2, . . . u such for 8 ¸

|ξi|ρ 8.

i 1

The norm of an element x tξi u P lp is defined as

}x}p

8 ¸

|ξi|

1

p

p

i 1

The space l8 consists of bounded sequences. The norm of an element x supi |ξi |.

tξiu P l8 is defined as }x}8

T HEOREM 3.1.2 (Cauchy-Schwarz Inequality). If x tξi u, y tηi u P l2 , then

8 ¸

i 1

|ξiηi| ¤

8 ¸

1 2

ξi2

8 ¸

i 1

1 2

ηi2

}x}2}y}2.

i 1

Proof: First, we show that the geometric mean is less than or equal to the arithmetic mean. By elementary 1 2 2 2 algebra, we have 0 ¤ px y q2 for any x, y P R. Then xy ¤ 21 x2 2 y . Let a x , b y . Then 1 1 a 2 b 2 ¤ 12 a 12 b. Now consider x tξi u, y tηi u P l2 . We set a |ξi |2 {}x}2 , b |ηi |2 {}y}2 and use the above inequality for each component i. Summing over all i gives

8 ¸

8 1 |ξ |2 |ξiηi| ¤ ¸ i } x } } y } 2 } x }2 2 2 i1 i1 Then

1 |ηi |2 2 }y}2

°8

|ξi ηi | ¤ }x}2 }y}2 .

21 }}xx}}2 2

1 }y}2 2 }y}2

1. q

i 1

As an aside, while Augustin Cauchy first published the inequality for sums in 1821, the corresponding inequality for integrals was first stated by Viktor Bunyakovsky in 1859, well before Hermann Schwarz’s 41

42 U

3. B ANACH S PACES

rediscovery of the inequality in 1888. This fact illustrates Stigler’s Law of Eponymy: “No scientific discovery is named after its original discoverer.” The law, proposed by University of Chicago statistics professor Stephen Stigler in 1980, applies to itself since long before Stigler formulated it, A. N. Whitehead noted that “Everything of importance has been said before, by someone who did not discover it.” Sigler himself attributed “his” law to the sociologist Robert K. Merton, famous for coining the phrases “self-fulfilling prophecy” and “role model” – as well as for being the father of Harvard economist Robert C. Merton, co-discoverer of the Black-Scholes(-Merton) option-pricing formula. ¨ Before proving the Holder inequality, a generalization of the Cauchy-Schwarz inequality, we provide a motivating example. Example 3.1.3. Consider a specific vector x tξi u P X, where ξi 1{i0.3 . What is the smallest integer p ° °8 0.9 for which the norm }x}p is finite? If ηi 1{i0.3 , does 8 i1 |ξi ηi | converge? If ηi 1{i , does i1 |ξi ηi | converge? °

p Solution: Since the series 8 1 1{i converges for p ¡ 1, we want to find the smallest integer p such that °i8 q 0.3p ¡ 1. Therefore, p 4. i1 |ξi ηi | does not converge if ηi 1{i0.3 , but it does if ηi 1{i0.9 .

We can view ηi as an infinite set of coefficients that define a linear functional on X. If ξi 1{ip and ° 1{q ¡ 1. ηi 1{iq , then a sufficient condition for 8 i1 |ξi ηi | to converge is 1{p Cauchy-Schwarz was a special situation, because when we use the l2 -norm in X, we also want the same l2 norm in the dual space of linear functionals on X. In general, if we use the lp -norm in X and want ° 1 1 the linear functional 8 i1 |ξi ηi | to converge, we should use the lq -norm, where p q 1, on the dual ¨ space of linear functionals. Therefore we need to prove the Holder inequality. Before proving the main inequality, first we derive an auxiliary inequality. Proposition 3.1.4. For a, b ¥ 0 and 0 λ 1, aλ b1λ

¤ λa p1 λqb

Proof: Consider the function f ptq tλ λt λ 1 for t ¥ 0. Then f 1 ptq λptλ1 1q. Since 0 we have f 1 ptq ¡ 0 for 0 t 1 and f 1 ptq 0 for t ¡ 1. Therefore, f ptq ¤ f p1q 0 for t ¥ 0, so tλ for t ¥ 0. If b 0, substituting t b 0, the inequality is obvious.

¤ λt

λ 1,

1λ

a{b and multiplying both sides by b gives the desired inequality.

¨ Q UIZ T HEOREM 5 (Holder Inequality ([1, Theorem 2.10.1, p. 29])). If p and q satisfy 1 ¤ q ¤ 8, and x tξi u P lp , y tηi u P lq , then

8 ¸

i 1

|ξiηi| ¤ }x}p}y}q .

1 p

1 q

If q

1, 1 ¤ p ¤ 8,

3.1. lp Space U

Equality holds if and only if

|ξi| |ηi|

}x}p }y}q 1 q

43

1 p

for all i.

Proof: First consider the case p 8, q

8 ¸

|ξiηi| ¤

1. Then we have

8 ¸ max ξi ηi i

| |

i 1

i 1

max |ξi| i

8 ¸

|ηi| }x}p}y}q .

i 1

The case p 1, q

8 is equivalent. Now consider the cases 1 p 8, 1 q 8. Set a |ξi |p {}x}p , b |ηi |q {}y}q , and λ 1{p, and use

the above inequality for each component i. Summing over all i gives

8 ¸

8 1 |ξ |p |ξiηi| ¤ ¸ i } x } } y } p } x }p p q i1 i1 Then

1 |ηi |q q }y}q

p1 }}xx}}p p

1 }y}q q }y}q

1.

°8

|ξi ηi | ¤ }x}p }y}q .

q

i 1

¨ We can use the Holder inequality to generalize the Euclidean triangle inequality to the Minkowski inequality.

Q UIZ T HEOREM 6 (Minkowski Inequality ([1, Theorem 2.10.2, p. 31])). If x, y P lp with 1 ¤ p ¤ 8, then so is x y, and }x y}p ¤ }x}p }y}p . For 1 p 8, equality holds if and only if x ky for some positive constant k.

Proof: For p 1,

}x

y }1

8 ¸

|ξi

i 1

ηi | ¤

8 ¸

|ξi|

i 1

8 ¸

|ηi| }x}1

y}1 ,

i 1

where we apply the triangle inequality to each term in the summation. For p 8,

}x

y}8

max |ξi i

ηi | ¤ max |ξi | i

max |ηi | }x}8 i

}y}8.

For 1 p 8, we first consider finite sums. We have

p}x

y }p q p

n ¸

i 1

|ξi

ηi |p

n ¸

i 1

|ξi

ηi |p1 |ξi

ηi | ¤

n ¸

i 1

|ξi

ηi |p1 |ξi |

n ¸

i 1

|ξi

ηi |p1 |ηi |.

44 U

3. B ANACH S PACES ηi |p1 u as an element of lq , with

Now apply Quiz Theorem 5 to both sums, treating t|ξi n ¸

|ξi

ηi |p

¤

n ¸

ηi |pp1qq

|ξi

i 1

i 1

n ¸

|ξi

ηi |p

1 q

1 q

i 1

n ¸

|ξi|p

1

p

i 1 n ¸

|ξi|p

n ¸

p

n ¸

i 1

|ξi

ηi |

1

p

p

¤

n ¸

i 1

i 1

|ξi|

p

|ηi|p

1 q

1. Then

1 p

i 1

1

|ηi|p

1 p

,

i 1

since pp 1qq q {q 1. Dividing both sides of the above equation by p that 1 1q p1 , we have

n ¸

1 p

1

p

n ¸

°n

|ξi

i 1

|ηi|

ηi |p q q and remembering 1

1

p

p

.

i 1

Letting n Ñ 8 on the right-hand side can only increase its value, so

n ¸

|ξi

ηi |p

1

p

¤ }x}p }y}p.

i 1

Note that this holds for all n; therefore, we can also let n Ñ 8 on the left-hand side to obtain the desired ¨ result. The conditions for equality follow from the Holder inequality. q

3.2

Lebesgue Integration

Definition 3.2.1 (volume zero). A set S R has volume zero if, for any union of a finite set of intervals with total length less than .

¡ 0, S can be contained in the

Definition 3.2.2 (measure zero). A set S R has measure zero if, for any ¡ 0, S can be contained in the union of a countably infinite set of open intervals with total length less than .

Example 3.2.3 (rational numbers). Show that the set Q Xr0, 1s has measure zero but does not have volume zero. Solution: Since Q X r0, 1s is countable, we can enumerate the elements of the set as q1 , q2 , . . . . For any ¡ 0, define the open interval Ik pqk 2k , qk 2k q. The sum of the lengths of these intervals is . Therefore, Q Y r0, 1s has measure zero. Assume Q X r0, 1s can be contained in the union C Yni1 Ci , where n 8 and maxi |Ci | ¤ δ. Then |C | ¤ nδ. For nδ 1, there exists an interval r0, 1s C with length 1 nδ ¡ 0; since Q is dense in R, there must be an element of Q X r0, 1s in this interval. Therefore, Q X r0, 1s does not have volume zero. q The proof that Q X r0, 1s has measure zero works for any countably infinite set. However, there also exist sets of measure zero that are uncountably infinite. The Cantor set is one famous example:

3.2. Lebesgue Integration U

45

Example 3.2.4 (Cantor set). Define the Cantor set as follows: • Step 0: Let the Cantor set be the interval r0, 1s. • Step i: Remove the middle third of every interval in the Cantor set at Step i 1. Show that the Cantor set has volume, and therefore measure, zero. Solution: At Step i, the Cantor set contains 2i intervals with total volume p2{3qi . For every , there exists some i such that p2{3qi . Then we can contain the Cantor set in the 2i 8 intervals with total length less than . q

Figure 3.2.1: the Cantor set (Example 3.2.4). Image taken from Wikimedia Commons. As an interesting aside, note that the Cantor set is in one-to-one correspondence with r0, 1s. Consider a point x in the Cantor set. That point is in r0, 1s, a left or right subinterval of r0, 1s, a left or right subinterval of that subinterval, and so on (see Figure 3.2.1). We can encode the position of x by the binary expansion 0.b1 b2 . . . , where bi equals 0 if x is in the left subinterval at Step i and 1 if x is in the right subinterval at Step i. If f and g are Riemann (or, equivalently, Darboux) integrable, then so is f g, and it follows by induction that the sum of any finite series of Riemann integrable functions is integrable. The Lebesgue integral extends this idea to a function that is the sum of an infinite series of functions. Definition 3.2.5 (Lebesgue integral). Suppose that f pxq A

8 »b ¸

k 1 a

converges. Then f pxq is defined, in the sense that sure zero. Furthermore,

8 »b ¸

k 1 a

°8

k 1 fk

pxq. Further suppose that the sum

|fk pxq| dx

°8

k 1 fk

fk pxq dx

pxq converges, except perhaps on a set of mea-

»b a

f pxq dx

is the Lebesgue integral of f pxq on ra, bs.

Note that if f is Riemann integrable, then we can write f1 pxq f pxq and fk pxq 0 for k ¡ 1, so the Lebesgue integral of f equals the Riemann integral. The following proposition extends the result that if functions f and g are equal except on a set of volume zero, their Riemann integrals are equal:

46 U

3. B ANACH S PACES

Proposition 3.2.6. If f pxq

°8

k 1 fk

pxq and gpxq °8k1 gk pxq are equal except on a set of measure zero, then »b a

f pxq dx

»b a

g pxq dx.

Example 3.2.7 (Dirichlet function). Show that the Dirichlet function 1xPQXr0,1s is Lebesgue integrable (with integral zero) but not Riemann integrable. Solution: We showed in Example 3.2.3 that Q X r0, 1s has measure zero, so the Lebesgue integral of the Dirichlet function is zero. To show that the Dirichlet function is not Riemann integrable, consider any dissection of r0, 1s. Each interval in this dissection will contain a rational number and a nonrational number since Q is dense in R. Therefore, the upper Darboux sum will always be 1, while the lower Darboux sum will always be zero. Therefore, the Darboux integral, and hence the Riemann integral, does not exist. q The improper integral of an unbounded function cannot be defined as a Riemann integral, since the upper Darboux sum does not exist. However, Lebesgue integration solves the unbounded function problem: Example 3.2.8 (unbounded function). Evaluate the Lebesgue integral »1 0

?1x dx alim Ñ0

Solution: Let fk pxq Then

#

a

?1x dx.

x ¤ 2 1

?1x

1 2k

0

otherwise

»

Ik

»1

|fk pxq| dx 2 R

k 1

k1

1 2

°8

2

is a telescoping series and evaluates to 2. Since |fk pxq| also equals 1.

k 0 Ik

.

k

1 2

Example 3.2.9 (unbounded interval). Evaluate the Lebesgue integral

0

ex dx.

;

fk pxq for all k and x, the given integral

Lebesgue integration also solves the unbounded interval problem:

»8

2

q

3.3. Lp Space U Solution: Let fk pxq

#

ex

k1¤x k

0

otherwise

47

.

»

Then Ik

|fk pxq| dx epk1q ek ; R

°8

k0 Ik is a telescoping series and evaluates to 1. Since |fk pxq| also equals 1.

fk pxq for all k and x, the given integral q

3.3 Lp Space Definition 3.3.1 (Lp space). For p ¥ 1, the space Lp ra, bs consists of all real-valued measurable functions x on ra, bs for which |xptq|p is Lebesgue integrable. The norm of an element x P Lp ra, bs is defined as

}x}p

» b a

|xptq|

p

p1

dt

.

Of course, }x}p 0 does not necessarily imply that xptq 0 since xptq may be nonzero on a set of measure zero, as we saw in Example 3.2.7. Therefore, we do not distinguish between functions that are equal almost everywhere and redefine 0 as the equivalence class of all functions that equal zero except on a set of measure zero. Note that we can define the Rp spaces similarly, replacing “Lebesgue integrable” with “Riemann integrable”. While most of the functions we see in these notes will be Riemann integrable, we prefer to work with the Lp spaces since they are Banach spaces (as we will see shortly), allowing us to use a strong theoretical framework. A tricky special case is L8 ra, bs. The obvious definition is }x}p suptPra,bs |xptq|, but this is ambiguous since an element of x corresponds not to a single function but to an equivalence class of functions that differ on a set of measure zero. Therefore, we define the norm as

}x}8

inf p qxptq

y t

almost everywhere

sup |y ptq| ess sup |y ptq|,

Pr s

t a,b

Pr s

t a,b

where ess sup denotes the essential supremum. Results analogous to Quiz Theorems 5 and 6 hold for the Lp spaces:

¨ T HEOREM 3.3.2 (Holder Inequality ([1, Theorem 2.10.3, p. 32])). If p and q satisfy 1 ¤ q ¤ 8, and x P Lp ra, bs, y P Lq ra, bs, then »b a

xptqy ptq dt ¤ }x}p }y }q .

1 p

1 q

1, 1 ¤ p ¤ 8,

48 U

3. B ANACH S PACES

Equality holds if and only if

almost everywhere on ra, bs.

|xptq| p |yptq| q }x}p }y}q

Proof: The proof is essentially the same as that of Quiz Theorem 5.

q

T HEOREM 3.3.3 (Minkowski Inequality ([1, Theorem 2.10.4, p. 33])). If x, y so is x y, and }x y }p ¤ }x}p }y }p .

P Lpra, bs with 1 ¤ p ¤ 8, then

Proof: Consider the function f ptq tp . Clearly this function is convex on r0, 8q for p ptp1 and f 2 ptq ppp 1qtp2 ¥ 0 for p ¥ 0 and t ¥ 0. Define aptq Then

|xptq| }x}p

bptq

|yptq| }y }p

λ

¥ 1 since f 1ptq

}x}p . }x}p }y}p

|xptq yptq|p p¤aq |xptq|p |yptq|p pλaptq p1 λqbptqqp p¤bq λpaptqqp p1 λqpbptqqp, p}x}p }y}pqp p}x}p }y}pqp

where (a) follows from the triangle inequality and (b) follows from the convexity of tp . Then integrating over ra, bs gives }x y}p p ¤ λ p1 λq 1, }x}p }y}p yielding the desired result.

q

3.4

Cauchy Sequences

Definition 3.4.1 (Cauchy sequence). A sequence txn u in a normed space is a Cauchy sequence if for any ¡ 0, there exists an integer N such that }xm xn } for all n, m ¡ N .

Example 3.4.2. Show that the partial sums over the vector space Q sn

k ¸

p1qi 2i 4

i 0

1

form a Cauchy sequence. Solution: We recognize that tan1 x

8 ¸

p1q2i

i 0

1

1

2i

1

,

3.4. Cauchy Sequences U so sn

49

Ñ 4 tan1 1 π. Without loss of generality, let n ¡ m. Then for n, m ¡ N , |sn sm| sn sm π sm Ñ 0.

Note that since π R Q, the series does not converge. The obvious fix is to let tsn u be a Cauchy sequence over the vector space R. q

Q UIZ T HEOREM 7 (includes [1, Lemma 2.11.1, p. 35]). Every convergent sequence is a Cauchy sequence. Furthermore, every Cauchy sequence is bounded. Proof: If a series txn u converges to x, then for every }xn x} . Then we have

}xm xn} }pxm

¡ 0, there exists an N such that for every n ¡ N ,

xq pxn xq} ¤ }xm

x}

}xn x} ¤ 2,

where the first inequality follows from the triangle inequality. If we set 1 2, we have shown that for every 1 ¡ 0, there exists an N such that for every m, n ¡ N , }xm xn } 1 . Therefore, txn u is a Cauchy sequence. To show that every Cauchy sequence is bounded, pick an N such that }xn xN } 1 for n ¡ N . Then for any n ¡ N , }xn} }xn xN xN } ¤ }xn xN } }xN } 1 }xN },

where the first inequality follows from the triangle inequality. Since }xn } bounded.

}xN } for any n ¡ N , txnu is

q

Definition 3.4.3 (Banach space). A normed linear vector space X is complete if every Cauchy sequence in X has a limit in X. A complete normed linear vector space is a Banach space. From Example 3.4.2, we see that Q is not complete. Example 3.4.4 ([1, Example 2.11.2, p. 34]). Let X be the vector space of sequences of real numbers with only finitely many nonzero components with the l8 norm and define xn

"

1 1 1, , , , 0, 0, 2 n1

*

.

Show that txn u is a Cauchy sequence. How would you “complete” X to make this Cauchy sequence convergent in X? Solution: We have }xn xm } maxp1{n, 1{mq Ñ 0. Clearly txn u does not converge to an element of X, but we can “complete” X by turning it into the vector space of all sequences of real numbers that converge to zero. q

50 U

3. B ANACH S PACES

Example 3.4.5. Let X

R1r0, 1s and define xn ptq

#

0

?1t

0 t¤

1 n

t 1

1 n

.

Show that txn u is a Cauchy sequence. To what function does this sequence converge? Is it in R1 r0, 1s, and if not, of what larger space is it an element? Solution: The function converges to xptq Example 3.2.8, we see that x P L1 r0, 1s.

t1{2. Because x is unbounded, x R R1r0, 1s. However, from q

Q UIZ T HEOREM 8 ([1, Example 2.11.4, p. 35]). lp , for 1 ¤ p ¤ 8, is a Banach space. Proof: First consider the case 1 ¤ p 8. Let txn u be a Cauchy sequence in lp . Then if xn

|ξkn ξkm| |ξkn ξkm|p ¤

1 p

8 ¸

|ξin ξim|p

1

p

tξinu,

Ñ 0,

i 1

so for each k, tξkn u is a Cauchy sequence and therefore converges to a limit ξk . We show that x tξk u P lp . From Quiz Theorem 7, there exists some M such that }xn } ¤ M for all n, so j ¸

|ξin|p ¤ }xn}p ¤ M p.

i 1

Since the sum on the left is finite, the inequality holds as n Ñ 8: j ¸

|ξi|p ¤ M p.

i 1

Furthermore, since the inequality holds uniformly in j,

8 ¸

|ξi|p ¤ M p,

i 1

so x P lp , and }x} ¤ M . We still must show that xn j ¸

Ñ x. For any ¡ 0, there exists an N such that

|ξin ξim|p ¤ }xn xm}p ¤

i 1

since txn u is a Cauchy sequence. Since the sum on the left is finite, the inequality holds as m Ñ 8: j ¸

i 1

|ξin ξi|p ¤ }xn x|p ¤ .

3.4. Cauchy Sequences U

51

Furthermore, since the inequality holds uniformly in j,

8 ¸

|ξinξi|p }xn x}p ¤

i 1

for n ¡ N , so xn Ñ x. Now let txn u be a Cauchy sequence in l8 . Then |ξkn ξkm | ¤ }xn xm } Ñ 0, so for each k, as before, tξknu is a Cauchy sequence and therefore converges to a limit ξk . Furthermore, this convergence is uniform in k. We show that x tξk u P l8 . Since tξk u is Cauchy, from Quiz Theorem 7 there exists an M such that for all n, }xn } ¤ M . Then |ξkn | ¤ }xn } ¤ M for all k and n, so x P l8 and }bx} ¤ M . Since ξkn Ñ ξk uniformly, xn Ñ x. q

Example 3.4.6 ([1, Example 2.11.3, p. 35]). Show that C r0, 1s is a Banach space. Solution: Let txn u be a Cauchy sequence in C r0, 1s. For each fixed t P r0, 1s, |xn ptq xm ptq| ¤a }xn xm} Ñ 0, so txn ptqu is a Cauchy sequence of real numbers. Since E (R with the Euclidean norm px y q2 |x y|) is complete, there exists a real number xptq such that xnptq Ñ xptq. Therefore, the functions xn converge pointwise to x. This pontwise convergence is actually uniform in t P r0, 1s; that is, for any ¡ 0, there exists an N such that |xn ptq xptq| for all t P r0, 1s and n ¥ N . For ¡ 0, choose N such that }xn xm} {2 for n, m ¡ N , which we can do since txnu is a Cauchy sequence. Then for n ¡ N ,

|xnptq xptq| ¤ |xnptq xmptq| |xmptq xptq| ¤ loooomoooon }xn xm} }xm x}. ¤{2

By choosing m sufficiently large, we can make the second term less than {2 since xm Ñ x pointwise, so |xnptq xptq| for n ¡ N . We must still show that x is continuous and therefore in C r0, 1s and that xn Ñ x in the norm of C r0, 1s. To prove continuity, choose ¡ 0. For all t, δ, n,

|xpt

δ q xptq| ¤ |xpt

δ q xn pt

δ q|

|xnpt

δ q xn ptq|

|xnptq xptq|.

By the uniform convergence of txn u, we can choose n such that the first and last terms are less than {3. Since xn is continuous, δ can be chosen such that the middle term is less than {3. Thus, x is continuous. Since xn ptq Ñ xptq uniformly, xn Ñ x in the norm of C r0, 1s. q

Example 3.4.7 ([1, Example 2.11.1, p. 35]). Show that the space of continuous functions on r0, 1s, with the norm »

}x}

is not a Banach space.

1

0

|xptq| dt,

52 U

3. B ANACH S PACES

Figure 3.4.1: Plots of x2 ptq, x3 ptq, and x4 ptq (Example 3.4.7). Solution: We must construct a Cauchy sequence of functions that convergence to a discontinuous function. Consider the sequence txn u, where $ ' ' &0

xn ptq

n1 1 1 1 . 2 n ¤t 2 t ¥ 12 0¤t

nt

n 2

' ' %1

1

1 2

Figure 3.4.1 shows plots of x2 ptq, x3 ptq, and x4 ptq. Then

} x n x m }

»

1 2 1 2

n1

nt

n 2

1

which clearly goes to zero as n, m x is the discontinuous function

mt

m 2

1 dt

1 1 p 1q 2 2

f rac1n

1 2

1 1 1 m 2 m

1 , n

Ñ 8. Therefore, txnu is a Cauchy sequence. However, xn Ñ x, where xptq

#

0

0¤t

1

1 2

1 2

¤t¤1

.

Therefore, the given space is not a Banach space. T HEOREM 3.4.8 ([1, Theorem 2.12.1, p. 38]). In a Banach space X, a subset S closed.

q

X is complete if and only if it is

Proof: If S is complete, then every Cauchy (and hence, by Quiz Theorem 7, convergent) sequence in S converges to a point in S, so S is closed. Furthermore, since X is a Banach space, any Cauchy sequence in S must converge to a point in x P X, and if S is closed, then x P S. Hence, if S is closed, then S must be complete. q T HEOREM 3.4.9 ([1, Theorem 2.12.2, p. 38]). In a normed linear vector space, any finite-dimensional subspace is complete. Proof: The proof is by induction on the dimension of the subspace. A 1-dimensional subspace is complete since all elements in the subspace can be written in the form x αe, where α is a real number and e is

3.5. Compactness and Extrema U

53

a fixed vector. Convergence of a sequence tαn eu is equivalent to convergence of the sequence of real numbers tαn u; this sequence is convergent since E is complete. Now assume that the theorem is true for subspaces of dimension N 1. Let X be a normed space and let M be an N -dimensional subspace of X. We show that M is complete. Let te1 , . . . , eN u be a basis for M . Define δk

inf ek a1 ,...,aN

αj ej , j 1 N ¸

k

1, . . . , N.

j k

δk is the distance from ek to the subspace Mk generated by the remaining N 1 basis vectors. We must have δk ¡ 0 otherwise a sequence of vectors in the (N 1)-dimensional subspace Mk could be constructed converging to ek R Mk , which contradicts the induction hypothesis that Mk is complete. Define δ mink δk and let txn u be a Cauchy sequence in M . Each xn has a unique representation xn

N ¸

λni ei .

i 1

Then for arbitrary n, m,

}xn xm}

N ¸ λni

p λmiq

i 1

ei

¥ |λnk λmk|δ,

k

1, . . . , N ;

since }xn xm } Ñ 0, |λnk λm 0 for each k. Then for each k, tλnk u is a Cauchy sequence and therefore k |Ñ °n converges to a limit λk . Let x k1 λk ek P M . We show that xn Ñ x. For all n,

}xn x}

n ¸ λnk

k 1

p λk q

ek

¤ N 1¤max |λni λk |}ek }, k¤N

but since |λnk λk | Ñ 0 for all k, }xn x} Ñ 0. Thus, xn

3.5

Ñ x.

q

Compactness and Extrema

Definition 3.5.1 (compact). A set K in a normed space X is compact if, given an arbitrary sequence txi u in K, there is a subsequence txin u that converges to an element x P K. By the Bolzano-Weierstraß theorem, “compact” is equivalent to “closed and unbounded” in finite-dimensional vector spaces. Note that if K is compact, then every Cauchy sequence converges to a limit in K, so K is also complete. We can now prove the extreme value theorem (see Theorem 1.2.1 for an equivalent statement): Definition 3.5.2 (upper semicontinuous). A real-valued functional f defined on a normed space X is upper semicontinuous at x0 if for any ¡ 0, there exists a δ ¡ 0 such that }x x0 } δ implies f pxq f px0 q (note the absence of the usual absolute value signs).

54 U

3. B ANACH S PACES

T HEOREM 3.5.3 (extreme value theorem [1, Theorem 2.13.1, p. 40]). An upper semicontinuous functional on a compact subset K of a normed linear vector space X achieves a maximum on K. Proof: Let M supxPK f pxq (we allow the possibility that M 8). There is a sequence txi u in K such that f pxi q Ñ M . Since K is compact, there is a convergent subsequence xin Ñ x P K. Clearly f pxin q Ñ M and since f is upper semicontinuous, f pxq ¥ limnÑ8 f pxin q M . But since K is compact, f is bounded, so M 8 and then f pxq M . q Example 3.5.4 ([1, Example 2.13.1, p. 40]). Consider the linear functional f pxq

»

1 2

0

xptq dt

»1 1 2

xptq dt,

defined for x P C r0, 1s. Show that f is continuous. What function on the unit sphere in C r0, 1s maximizes f ? Is this function an element of C r0, 1s? Solution: We have

|f pxq f px0q|

»

1 2

0

»1

pxptq x0ptqq dt pxptq x0ptqq dt ¤ 21 }x x0} 21 }x x0} }x x0}. 1 2

Then for every ¡ 0, there exists a δ such that }x x0 } δ implies |f pxq f px0 q| . The unit ball in C r0, 1s is the set of continuous functions x such that 1 ¤ xptq ¤ 1 for all t P r0, 1s. The supremum of f over the unit ball in C r0, 1s is clearly 1, but no continuous function x with }x} ¤ 1 achieves this supremum. (If we instead considered L8 r0, 1s, then the function x pt q

#

1

1

0¤t 1 2

1 2

¤t¤1

achieves the supremum.) This shows that the unit ball in C r0, 1s is not compact.

q

In fact, the unit ball is not compact in any infinite-dimensional normed linear vector space: T HEOREM 3.5.5. In any infinite-dimensional normed linear vector space X, the unit ball is not compact. Proof: We construct an infinite sequence of linearly independent unit vectors zn such that for all n, m, }zn zm} ¡ 1{2. Let tz1, . . . , zn1u be a set of n 1 such vectors, and let Y be the closed subspace generated by these vectors. Y is a proper subspace of X since X is not finite-dimensional. Therefore, we can choose x P X zY and let d inf YPY }x y}. Since Y is finite-dimensional, it is complete by Theorem 3.4.9. If d were zero, then there would exist a sequence of vectors in Y that converged to x R Y , contradicting our hypothesis that Y is finite-dimensional and hence complete. Therefore, d ¡ 0. Choose some y0 P Y such that d }x y0 } 2d, and let zn

}xx yy0} . 0

3.6. Quotient Spaces U

55

Then by construction, }zn } 1. Now for any y P Y ,

}zn y}

x

y0 y}x y0} . } x y0 }

The numerator is greater than d and the denominator is less than 2d, so the fraction is greater than 1{2. Since the above equation holds for all y P Y , }zn zi } ¡ 1{2 for all i n. By repeating this process for n ¥ 2, we construct an infinite sequence of vectors in the unit ball (specifically on the unit sphere), such that }zn zm } ¡ 1{2 for all n, m. Thus, this sequence has no Cauchy subsequence and hence, by Quiz Theorem 7, no convergent subseqeunce. Therefore, the unit ball is not compact. q

3.6

Quotient Spaces

Definition 3.6.1 (equivalence). Let M be a subspace of a vector space X. Two elements x1 , x2 equivalent modulo M if x1 x2 P M . In this case, we write x1 x2 .

P X are

This equivalence relation partitions X into disjoint subsets, called cosets, of equivalent elements. Geometrically, these cosets are linear varieties that are distinct translations of the subspace M . Each x P X belongs to a unique coset of M , denoted rxs. Definition 3.6.2 (quotient space). Let M be a subspace of a vector space X. The quotient space X {M consists of all cosets of M with addition defined by rx1 s rx2 s rx1 x2 s and scalar multiplication defined by arxs raxs.

Definition 3.6.3 (norm in a quotient space). If X is a Banach space and M is a closed subspace of X, then the norm of a coset rxs P X {M is defined as

}rxs} minfPM }x

m}

Example 3.6.4. Show that this definition satisfies the axioms for a norm.

Solution: Positive homogeneity follows since

}αrxs} }rαxs} minfPM }αx paq

minfPM }αx

αm}

pbq

m} inf

P

αm M

|α|}rxs},

}αx

αm}

where (a) follows because M is a linear subspace, so finding the infimum over m P M is the same as finding the infimum over αm P M and (b) follows by the positive homogeneity of the norm } } in X.

56 U

3. B ANACH S PACES

The triangle inequality follows since

}rxs rys} }rx

ys} inf

P

m M

paq

minfPM }x

}x

2m}

y

y

m} inf

P

2m M

pbq

}x

y

2m}

¤ }rxs} }rys},

where (a) follows for the same reason as before and (b) follows by the fact that the norm } } in X obeys the triangle inequality. Positivity follows from the positivity of the norm } } in X. Clearly }r0s} 0. There remains the possibility that rxs does not include 0 but that there is a sequence in rxs that converges to 0, in which case }rxs} 0. However, this is impossible since M is closed, so }rxs} ¡ 0 if rxs r0s. Thus, we have shown positive definiteness. q

3.7

Denseness and Separability

Definition 3.7.1 (dense). A set D is dense in a normed space X if for each x exists a d P D such that }x d} .

P X and each ¡ 0, there

Note that if D is dense in X, there are points of D arbitrarily close to each x P X. Therefore, given x P X, we can construct a sequence of elements in D that converges to x. Thus, an equivalent definition is: D is ¯ X; that is, if the closure of D equals X. dense in X if D One of the most common examples of a dense set is the set Q in R. By the Weierstraß approximation theorem (1.5.6), the set of all polynomials with rational coefficients is dense in C ra, bs. Definition 3.7.2 (separable). A normed space is separable if it contains a countable dense set.

Q UIZ T HEOREM 9. lp , for 1 ¤ p 8, is separable. Proof: Let D be the set of all finitely nonzero sequences with rational components. Clearly, D is countable. ° p Let x tξi u P lp and fix ¡ 0. Since x P lp , 8 i1 |ξi | 8, so there exists an N such that

8 ¸

i N 1

For i 1, . . . , N , let ri be a rational such that |ξi d P D and

}x d}p ¤

N ¸

i 1

so D is dense in lp .

|ξi|p 2 .

ri|p {2N , and let d tr1, . . . , rN , 0, 0, . . . u.

|ξi ri|p

8 ¸

Then

|ξi|p ¤ ,

i N 1

q

3.7. Denseness and Separability U

57

Example 3.7.3. Show that C ra, bs is separable. Solution: Let D be the set of all polynomials of finite degree with rational coefficients. Clearly, D is countable since it can be put into correspondence with the set of all finitely nonzero sequences with rational components. Given x P C ra, bs and ¡ 0, by the Weierstraß approximation theorem (Theorem 1.5.6), there exists a polynomial p such that |xptq pptq| {2 for all t P ra, bs. While p does not necessarily have rational coefficients, we can construct another polynomial r that does, such that |pptq rptq| {2 N c , for all t P ra, bs. We can easily construct r by changing each of the coefficients of p by less than 2N where N 1 deg p and c maxp|a|, |b|q. Then

|pptq rptq|

1 N¸ cN ti 2N

i 0

¤ 2 ,

a ¤ t ¤ b,

as desired. Thus

}x r} tmax |xptq rptq| ¤ tmax |xptq pptq| Pra,bs Pra,bs so D is dense in C ra, bs.

max |pptq rptq| ¤ ,

Pr s

t a,b

q

Example 3.7.4. Show that l8 is not separable. Solution: Suppose we have found a countable dense subset txn u of l8 , where xn tξin u. (We can index the sequence txn u by n since the sequence is countable.) Construct x tξi u by the following rule:

21 , set ξi 1. If ξii ¥ 21 , set ξi 0.

• If ξii •

Then }x xi } ¥ 21 for all i, since x and xi differ by at least dense, so l8 is not separable.

1 2

in their ith component. Therefore txn u is not q

Example 3.7.5. Show that L8 is not separable. Solution: Consider the subset of indicator functions t1t 1xPr0,ts utPr0,1s P L8 . Clearly this set is uncountable since the set of real numbers in the interval r0, 1s is uncountable. Furthermore, }Is It } 1 if s t. Therefore, the open balls tB1{2 pIt qutPr0,1s do not intersect. However, any dense set of L8 must include an element in each of these balls; since the set of balls is uncountable, any dense set of L8 is uncountable. Therefore L8 is not separable. q

58 U

3. B ANACH S PACES

C HAPTER 4 HILBERT SPACE

Reading: [1, Chapter 3].

4.1

Inner Products

Definition 4.1.1 (pre-Hilbert space). A pre-Hilbert space is a linear vector space X over a field F along with an inner product defined on X X, which associates with any two vectors x, y P X a scalar px, yq. The inner product satisfies the following axioms: 1. conjugate symmetry: px, yq py, xq for all x, y P X. 2. sesquilinearity (linearity in the first variable): px for all x, y, z P X and all λ P F.

y, zq

px, zq py, zq and pλx, yq λ px, yq

3. positivity and positive definiteness: px, xq ¥ 0 and px, xq 0 if and only if x 0.

Note that by conjugate symmetry and linearity in the first term,

px, y

zq p y

z, xq py, xq

pz, xq px, yq px, zq .

The property that pw y, x zq pw, xq py, zq is sometimes called additivity. If F R, then conjugate symmetry becomes regular symmetry. Therefore, if F R, as we will assume for the rest of this text, sesquilinearity becomes standard bilinearity. a a Let }x} px | xq. We verify that } } p | q is indeed a norm: 1. positive homogeneity:

}αx} pαx, αxq paq a

a

α px, αxq

pbq

b

αpαx, xq

p cq

b

ααpx, xq |α|}x},

where (a) and (c) follow from sesquilinearity and (b) follows from conjugate symmetry. 2. triangle inequality:

}x

y }2

px y, x yq px, xq px, yq py, xq py, yq ¤ }x}2 2| px, yq | }y}2.

By the Cauchy-Schwarz inequality, (re)proved below, | px, yq | inequality and taking square roots gives }x y} ¤ }x} }y}. 59

¤ }x}}y}; substituting this into the

60 U

4. H ILBERT S PACE

3. positive definiteness and positivity: This follows from the positivity and positive definiteness of the inner product. Below we prove a version of the Cauchy-Schwarz inequality equivalent to Proposition 3.1.2. Proposition 4.1.2 (Cauchy-Schwarz Inequality ([1], Lemma 1, p. 47)). For all x, y in an inner product space, | px, yq | ¤ }x}}y}. Equality holds if and only if x λy or if y 0. Proof: If y 0, the inequality holds trivially. If y 0, then for all scalars λ, 0 ¤ px λy, x λyq px, xq λ py, xq λ px, yq

|λ2| py, yq .

Then for λ px, yq { py, yq, 0 ¤ px, xq

| px, yq |2 ùñ | px, yq | ¤ apx, xq py, yq }x}}y}. py, yq

q

The following lemma generalizes a useful result in Euclidean geometry: Proposition 4.1.3 (parallelogram law ([1], Lemma 3, p. 48)). For any x, y in a pre-Hilbert space X,

}x

y }2

}x y}2 2}x}2

2}y}2 .

Proof: In the general case, directly expand the norms in terms of the inner product. For X R2 , we can prove the parallelogram law using only the law of cosines. Define the inner product px | yq as the dot product x y }x}2 }y}2 cos θ, where θ is the angle between the vectors x and y. Then by the law of cosines,

}x y}22 }x}22 }y}22 2}x}2}y}2 cos θ }x y}22 }x}22 }y}22 2}x}2}y}2 loooomoooon cospπ θq . cos θ

Summing these equations gives the desired result.

q

The Cauchy-Schwarz inequality allows us to prove the continuity of the inner product. Proposition 4.1.4 (continuity of the inner product ([1], Lemma 4, p. 49)). Suppose that xn in a pre-Hilbert space. Then pxn , yn q Ñ px, yq.

Ñ x and yn Ñ y

Proof: Because the sequence txn u is convergent, it is bounded; say }xn } ¤ M . Then

| pxn, ynq px, yq | | pxn, ynq pxn, yq pxn, yq px, yq | ¤ | pxn, yn yq | | pxn x, yq |. By the Cauchy-Schwarz inequality,

| pxn, ynq px, yq | ¤ }xn}}yn y} }xn x}}y}.

4.2. The Projection Theorem U

61

Since }xn } is bounded,

| pxn, ynq px, yq | ¤ M }yn y} }xn x}}y} Ñ 0. 4.2

q

The Projection Theorem

Definition 4.2.1 (orthogonality). In a pre-Hilbert space, two vectors x, y are said to be orthogonal, written x K y, if px, yq 0. A vector is said to be orthogonal to a set S, written x K S, if x K s for all s P S. Consider the following optimization problem in a pre-Hilbert space: given a vector x in a pre-Hilbert space X and a subspace M in X, find arg minmPM }x m}. If x P M , then the solution is trivial, but if x R M , we must answer three questions: 1. Is there a vector m others?

P M that minimizes }x m}, or is there no m that is at least as good as the

2. Is the solution unique? 3. What is the solution? The following version of the projection theorem answers these questions. T HEOREM 4.2.2 (Projection Theorem: pre-Hilbert space ([1], Theorem 1, p. 50)). Let X be a pre-Hilbert space and M be a subspace of X, and let x P X. If there exists a vector m0 P M such that }x m0 } ¤ }x m} for all m P M , then m0 is unique. Furthermore, m0 arg minmPM }x m} if and only if x m0 K M . Proof: First we show that if m0 arg minmPM }x m}, then x m0 K M . Suppose for contradiction that there exists an m P M such that x m0 M m. Without loss of generality, assume that }m} 1 and px m0 | mq δ 0. Define m1 m0 δm. Then

}x m1}2 }x m0 δm}2 }x m0}2 px m0, δmq pδm, x m0q |δ|2 }x m0}2 |δ|2 }x m0}2. so if x m0 M M , m0 is not a minimizing vector. Now we show that if x m0 K M , then m0 arg minmPM }x m} and m0 is unique. For any m P M ,

}x m}2 }x m0

m0 m}2

}x m0}2 }m0 m}2

by the Pythagorean theorem. Then m0 arg minmPM }x m} since }x m} ¥ }x m0 } by positive definiteness of the norm, and m0 is unique since }m0 m}2 0, and therefore }x m} }x m0 }, if and only if m0 m. q The following “counterexample” will suggest a stronger version of the projection theorem.

62 U

4. H ILBERT S PACE

Example 4.2.3. Let X be the space of infinite sequences, with the l2 -norm and the inner product px | yq 1 1 i1 ξi ηi for x tξi u, y tηi u P X. Let x 1, 2 , 3 , . . . , and let M be the subspace of sequences with only finitely many nonzero terms such that all even terms equal zero. Show that the minimizing vector m0 does not exist in M .

°8

Solution: Let m0

tηiu be a candidate for the minimizing vector, where η2k 0 for k P N. Then }x m0} 2

8 ¸

i 1

so the best we can do is set η2k1 2k11 for k with η2k 0 and η2k1 2k11 for k P N. Then

}x mn} 2

8 ¸

k 1

1 2k

2

1 i

ηi

2

,

n 8. Consider the sequence tmnu, where mn tηiu

8 ¸

k N

1

2

2k 1

8 ¸

1 π2 4 6

2

1

2k 1

k N

.

Because mn is the best we can do among infinite series with n 1 nonzero terms, a minimizing vector does not exist since }x mn } ¡ }x mn1 } for all n. The problem is that mn Ñ m0 , but m0 P M since M is not complete. q To fix the problem, we have the following definition. Definition 4.2.4 (Hilbert space). A Hilbert space is a complete pre-Hilbert space. That is, a Hilbert space is a Banach space with an inner product that induces the norm. We then prove the following stronger version of the projection theorem. Q UIZ T HEOREM 10 (Projection Theorem: Hilbert space ([1], Theorem 2, p. 51)). Let H be a Hilbert space and M be a closed subspace of H. For any x P H, there exists a unique vector m0 P M such that }x m0 } ¤ }x m} for all m P M . Furthermore, m0 arg minmPM }x m} if and only if x m0 K M . Proof: Uniqueness and orthogonality have already been established in the proof of Theorem 4.2.2, so it suffices to establish the existence of the minimizing vector m0 . If x P M , then m0 x and the theorem is trivial. If x R M , define δ inf mPM }x m}. We want to find an m0 P M such that }x m0 } δ. Let tmi u be a sequence of vectors in M such that }x mi } Ñ δ. By the parallelogram law (4.1.3),

}pmj xq px miq}2 }pmj xq px miq}2 2}mj x}2 or

}mj mi} 2}mj x} 2

Then }x pmi

2

2}x mi }

2

mj q{2} ¥ δ by definition of δ and }mi x}2

}mj mi}2 ¤ 2}mi x}2

4 x

2}x mi }2 ,

mi 2

mj 2

.

Ñ δ2 as i Ñ 8, so

2}x mj }2 4δ 2

Ñ 2δ2

2δ 2 4δ 2

0.

4.3. Orthogonal Complements U

63

Therefore tmi u is a Cauchy sequence and since M is a closed subspace of a complete space, tmi u has a limit tm0 u in M . By continuity of the norm, which follows from continuity of the inner product (4.1.4), }x m0} δ. q

4.3

Orthogonal Complements

Definition 4.3.1 (orthogonal complement). Given a subset S of a pre-Hilbert space, we call S K S u the orthogonal complement of S.

Example 4.3.2. If S Solution: S K Then S KK

tp1, 1, 1qu, what are S K, S KK, and S KKK?

tpx, y, zq : x

tpa, b, cq : ax

Finally, S KKK

tt : t K

y z

by

tpx, y, zq : λx

cz λy

0u, the plane through the origin perpendicular to the vector p1, 1, 1q. 0, x λz

y

z

0u tpa, b, cq : a b cu rtp1, 1, 1qus rS s.

0u S K.

q

Proposition 4.3.3 (([1], Proposition 1, p. 52)). Let S and T be subsets of a Hilbert space. Then 1. S K is a closed subspace.

S KK. If S T , then T K S K . S KKK S K .

2. S 3. 4.

Proof: We prove each statement individually: 1. S K is a subspace because for any x, y P S K , a, b P F, and s P S,

pax

by, sq a px, sq

b py, sq 0,

so ax by P S K . S K is closed because if txn u is a convergent sequence in S K , then continuity of the inner product implies 0 pxn , sq Ñ px, sq for all s P S, so x P S K . 2. If x P S, then x K y for all y P S K ; therefore, x P S KK . 3. If y P T K , then y K x for all x P S since S 4. From (2), S K S K S KKK .

S KKK and S S KK.

T . Therefore, y P S K.

Combining the latter result with (3), S KKK

SK.

Therefore,

q

64 U

4. H ILBERT S PACE

We want to show that given any closed subspace S of a Hilbert space H, any vector x can be written as the sum of of a vector m P M and a vector n P M T . First we need the following definition. Definition 4.3.4 (direct sum). A vector space X is the direct sum of two vector spaces M and N , written M ` N , if every vector x P X has a unique representation of the form x m n, where m P M and n P N.

Example 4.3.5. Show that R3 is the sum of the xy-plane and the xz-plane, but not the direct sum. Solution: In R3 , the xy-plane is given by M tpx, y, 0q : x, y P Ru and the xz-plane is given by N tpx, 0, zq : x, z P Ru; their sum is tp2x, y, zq : x, y, z P Ru tpx, y, zq : x, y, z P Ru R3. However, R3 is not the direct sum of the two planes since we can write px, y, z q pw, y, 0q px w, 0, z q, so no x P R3 can be written uniquely in the form x m n with m P M and n P N . q T HEOREM 4.3.6 (([1], Theorem 1, p. 53)). If M is a closed linear subspace of a Hilbert space H, then H M ` M K and M M KK .

Proof: Let x P H. By the projection theorem, there is a unique vector m0 P M such that }xm0 } ¤ }xm} for all m P M , and we can write n0 bx m0 P M K . Therefore, we can write x m0 n0 with m0 P M and n0 P M K . To show that this representation is unique, suppose x m1 n1 with m1 P M and n1 P M K . Then 0 m1 m0 n1 n0 . m1 m0 and n1 n0 are orthogonal, so by the Pythagorean theorem, 0 }m1 m0 }2 }n1 n0 }2 , so by positive definiteness of the norm, m0 m1 and n0 n1 . Therefore, H M ` M K . Now we show that M M KK . By Part (2) of Proposition 4.3.3, M M KK . To show the other direction, let x P M KK . By the above result, x m n where m P M KK and n P M KKK M K , where M KKK M K by Part (4) of Proposition 4.3.3. Since x, m P M KK , x m n P M KK . But n P M K as well, so pn, nq }n}2 0, which by positive definiteness of the inner product implies n 0. Thus, x m P M , so M KK M and we conclude M M KK . q

4.4

The Gram-Schmidt Procedure

Definition 4.4.1 (orthogonal set). A set S of vectors in a pre-Hilbert space is an orthogonal set if x K y for all x, y P S such that x y. If }x} 1 for all x P S, then the set is orthonormal.

Proposition 4.4.2 (([1], Proposition 1, p. 53)). An orthogonal set of nonzero vectors is a linearly independent set. Proof: Suppose tx1 , . . . , xn u is a finite subset of the given orthogonal set and that there are scalars α1 , . . . , αn ° such that ni1 αi xi 0. Then

n ¸

i 1

α i xi , xj

αj pxj , xj q p0, xj q 0,

j

1, . . . , n.

4.4. The Gram-Schmidt Procedure U Since xj

0, αj 0 for j 1, . . . , n. Then by Theorem 2.3.3, x1, . . . , xn are linearly independent.

65 q

T HEOREM 4.4.3 (Gram-Schmidt Process). Let txi u be a countable sequence of linearly independent vectors in a pre-Hilbert space X. Consider the vectors tei u defined by zi

xi

i¸1

pxi, ej q ej

ei

}zzi} i

j 1

for all i. Then tei u is an orthonormal sequence and for each n, rtx1 , . . . , xn us rte1 , . . . , en us. Proof: The proof of the first part is by induction. For the case n 1, te1 u is an orthonormal sequence. For general n, assume that te1 , . . . , en u is an orthonormal sequence and prove that te1 , . . . , en 1 u is also an orthonormal sequence. Because te1 , . . . , en u is an orthonormal sequence, it suffices to show that en 1 is orthogonal to each of e1 , . . . , en and that }en 1 } 1. Taking inner products,

pz n 1 | z i q

xn

1

n ¸

j 1

pxn px n

1 , xi q

px n

i¸1

1,

xi q p x n

ej q ej , xi

pxi, ek q ek

k 1

px n

k 1

1,

i¸1

1,

1 , ek q pek , xi q

xi q p xn

1,

xi q

n ¸

p xn

j 1

pxn

1,

1 , ej q pej , xi q

xi q 0,

i¸1

p xn

1,

ek q pek , xi q

k 1

1, . . . , n, where the last line follows from Parseval’s equality (??). Since en 1 zn 1{}zn 1}, K ei for i 1, . . . , n and }en 1} 1. Therefore, teiu is an orthonormal sequence. To show that rtx1 , . . . , xn us rte1 , . . . , en us for all n, observe that en is a linear combination of

for i en 1

x1 , . . . , xn . Therefore, we can also write xn as a linear combination of e1 , . . . , en ; thus, a vector can be written as a linear combination of x1 , . . . , xn if and only if it can be written as a linear combination of e1 , . . . , en . q

Example 4.4.4. Let X be the space of polynomials p such that deg p ¤ 2. Define the inner product

pp, qq for any p, q

»1

1

pptqq ptq dt

P X. Starting with the sequence t1, t, tu, construct an orthonormal basis for X.

66 U

4. H ILBERT S PACE

Solution: We have

x1 1 z1 1 ? e1 }z1} 2 z2 x2 px2 , e1 q e1 t z1

c

e2

z3

x3 px3, e1q e1 px3, e2q e2 t2 13

e3

z2 }z2}

z3 }z3}

3 2

2 t 3

c

5 2 1 t 2 2

c

5 . 2 q

Suppose we have a vector x in a Hilbert space X and want to find the closest vector to x that lies in the subspace M rty1 , . . . , ym us. We can use Theorem 10 and the Gram-Schmidt process to solve this problem. If M is one-dimensional, we can set ei yi {}yi } for any i 1, . . . , n to find the solution x px, ei q ei P M K . Example 4.4.5. Let X

R2 and define the inner product pv, wq 4v1w1

for any v pv1 , v2 q, w to y1 p5{2, 0q.

v 2 w2

pw1, w2q P X. Let M rty1us rtp1, 4qus. Find the vector in M that is closest

Solution: First we set

e1

y1 }y1}

? , ?2 2 5 5 1

.

Then the minimizing vector is x px, e1 q e1

52 , 0

?6 5

? , ?2 2 5 5 1

4, ?2 5

.

q

This technique works equally well if M is generated by more than one vector, provided that we start by converting the set ty1 , . . . , yn u into an orthonormal basis te1 , . . . , em u. (Of course, if y1 , . . . , yn are not linearly independent, m n.) Note that we can write this problem as max x α1 ,...,αn

n ¸

αi yi

i 1 n ¸

x

α i yi , yj

0,

j

1, . . . , n.

i 1

Note that the objective function is quadratic in α1 , . . . , αn and that the constraints are linear in α1 , . . . , αn ,

4.5. Fourier Series U

67

so using Lagrange multipliers reduces this problem to solving a system of linear equations. More specifically, if we write

G G py 1 , . . . , y n q

py 1 , y 1 q p y 1 , y n q

.. . px n , x 1 q

..

.. . p xn , xn q

.

α1 .. α .

c

px, y1q

.. , . px, ynq

αn

then we can write the constraints as GT α c. These are called the normal equations. G is called the Gram matrix. The normal equations have a unique solution if and only if det G 0. Proposition 4.4.6 ([1], Proposition 1, p. 56). det G 0 if and only if y1 , . . . , yn are linearly independent. Proof: Equivalently, we show that det G 0 if and only if y1 , . . . , yn are linearly dependent. If y1 , . . . , yn ° are linearly dependent, then there exist β1 , . . . , βn not all equal to zero such that ni1 βi yi 0. Taking inner products, n ¸

β i yi , yj

i 1

n ¸

βi pyi , yj q 0,

j

1, . . . , n,

i 1

or equivalently, n ¸

β i Gi

0

GTi

p yi , y1 q p yi , yn q

.

i 1

Therefore, the rows of the Gram matrix are linearly dependent, so det G 0. If det G 0, then the rows of the Gram matrix are linearly dependent, so there exist β1 , . . . , βn not all ° equal to zero such that ni1 βi pyi , yj q 0 for j 1, . . . , n. Therefore,

n ¸

β i yi , yj

0,

j

1, . . . , n,

i 1

so

n ¸

°n

j 1

βi

n ¸

βi yi , yj

0 ùñ

i 1

2 n ¸ βi yi

0.

i 1

0, so y1, . . . , yn are linearly dependent. q Note that if det G 0, solving the normal equations is equivalent to solving fewer than n equations in n

Thus

i 1 β i yi

unknowns, so a solution exists, though it is not unique.

4.5

Fourier Series

Now suppose that X is infinite-dimensional and that M is generated by a countably infinite set of linearly independent vectors. (Recall from Example 2.3.7 that M does not necessarily equal X.) Assume that we have constructed an orthonormal basis e1 , e2 , . . . for M using the Gram-Schmidt process. Then x ˆ

8 ¸

i 1

px, eiq ei

68 U

4. H ILBERT S PACE

is the vector in M closest to x. However, this is an infinite series, so we need the following necessary and sufficient condition for an infinite series of orthogonal vectors to converge in Hilbert space. T HEOREM 4.5.1 ([1], Theorem 1, p. 59). Let tei u be an orthonormal sequence in a Hilbert space H. A series of °n ° 2 the form 8 i1 |ξ | 8. In that case, ξi px, ei q for i ¥ 1. i1 ξi ei converges to an element x P H if and only if Proof: Suppose

°n

2 |ξ | 8 and define the partial sums sn

°n

i 1 ξi ei .

i 1

}sn sm} 2

2 m ¸ ξ e i i

i n 1

m ¸

Then

|ξi|2 Ñ 0

i n 1

as n, m Ñ 0, where n m without loss of generality. Therefore, tsn u is a Cauchy sequence; since H is complete by definition of a Hilbert space, there exists an x P H such that sn Ñ x. ° °8 2 2 If s Ñ x P H, then tsn u is a Cauchy sequence so m in 1 |ξi | Ñ 0 as n, m Ñ 0. Thus, in 1 |ξi | Ñ 0, °8 n so i1 |ξi |2 8. q Clearly psn , ei q ξi for i ¥ 1. Since sn Ñ x, psn , ei q Ñ px, ei q by Proposition 4.1.4. We call tξi u tpx, ei qu the Fourier coefficients of x with respect to tei u.

Proposition 4.5.2 (Bessel’s inequality ([1], Lemma 1, p. 59)). Let x be an element in a Hilbert space H and suppose tei u is an orthonormal sequence in H. Then

8 ¸

| px, eiq |2 ¤ }x}2.

i 1

Proof: Let αi

px, eiq for i ¥ 1. For all n, 0¤

Then

x

n ¸

i 1

2 α i xi

n ¸

}x}2

n ¸

|αi|2.

i 1

|αi|2 ¤ }x}2

i 1

for all n; letting n Ñ 8 establishes the inequality.

q

Example 4.5.3. Show that Bessel’s inequality holds even if tei u does not generate H. Solution: Let X be the space of polynomials on r1, 1s and define the inner product

pp, qq

»1

1

pptqq ptq dt


69

for any p, q P X. Let tei u be an orthonormal set of even polynomials and let xptq t. Clearly tei u are even polynomials, so px, ei q 0 for i ¥ 0. Then 0

8 ¸

i 1

| px, eiq |2 ¤ }x}2 32 ,

so Bessel’s inequality holds.

q °

2 Since Bessel’s inequality guarantees that 8 i1 | px, ei q | ¤ 8, Theorem 4.5.1 guarantees that the series °8 i1 px, ei q ei converges to some element. Since any Hilbert space H is complete by definition, that element is in H. Let S tei u be an infinite subset S of H; the subspace rS s include all finite linear ¯ s of combinations of the vectors in S, but since we have an infinite sum, we are interested in the closure rS rS s. For example, if S t1, t2, t4, . . . u, then cos t is in rS¯ s.

T HEOREM 4.5.4 ([1], Theorem 2, p. 60). Let x be an element in a Hilbert space H and suppose tei u is an orthonormal sequence in H. Then the series

8 ¸

px, eiq ei

i 1

ˆPM converges to an element x

rteius. Furthermore, x xˆ P M T .

Proof:

q

1. Convergence of Fourier series Since any Hilbert space H is complete, the series converges to an element of H. The vectors ei are an infinite subset S. The subspace rS s includes all finite linear combinations, but since we have an infinite sum, we need the closure of this subspace. If S

t1, t2, t4, u, what transcendental function is in the closure of rS s?

Prove (Theorem 2 on page 60) that x ˆ

8 ¸

px|eiqei

i 1

converges to an element of the closed subspace M generated by the ei .

70 U

4. H ILBERT S PACE

If the closed subspace M is H itself, then the orthonormal sequence ei is called complete, and Bessel’s inequality becomes an equality. A useful criterion (proof left for homework): the orthonormal sequence ei is complete if and only if the only vector orthogonal to each of the ei is the null vector θ. 2. Completeness: an example The Hilbert space is H L2 r1, 1s. From the sequence t1, t, t2 , u it is straightforward to use Gram-Schmidt to construct an orthonormal basis. Luenberger, p. 61, has an explicit formula, but we do not need it. To prove completeness, we assume (for a contradiction) the existence of a function f ptq that is orthogonal to each tn . This element f ptq of H may be discontinuous or unbounded, but its antiderivative F ptq

»t

1

f pτ qdτ

is a continuous function of t. Prove continuity Prove that F p1q F p1q 0. Using integration by parts, prove that

³1

n 1 t F ptqdt 0.

Using Weierstrass, we construct a polynomial Qptq such that |³F ptq Qptq| on [-1,1]. Now prove (this is tricky, see Luenberger p. 62) that 1 2 2 1 |F ptq| dt ¤ 2 .

Thus the continuous function F ptq is the zero function and the Lebesgue integrable function f ptq is in the same equivalence class as the zero function, i.e. it is θ.

3. When codimension is smaller - a finite example The Hilbert space H E 3 . Let a p6, 0, 0q, let M be the subspace spanned by the vectors z1 p1, 0, 1q and z2 p1, 2, 1q, which are conveniently orthogonal to one another, and let V be the linear variety a N .

P N that is closest to a. Find the vector a0 P V of minimum norm.

Find the vector m0

State and prove the theorem (Theorem 1 on p. 64) that these two problems (or any two like them) are really the same. There is an easier way to do this problem, since the dimension of N K is just 1. Since y p1, 1, 1q is perpendicular to N , ry s N K , and V is the set of all vectors that satisfy the single equation

px|yq pa|yq 6. Of these, the one with minimum norm is the one that lies in N K. So set x βy and solve the problem.

Of course, this approach would have been spectacularly better if H and N were infinite-dimensional while N K was 1-dimensional.


71

4. The dual approximation problem Theorem 2 on p. 65 generalizes the preceding example. Let y1 , y2 , yn be a set of linearly independent vectors. Luenberger calls the space that they span M , but it is more consistent with the preceding example to call it N K . Consider all vectors that satisfy the equations

px|y1q c1 px|y2q c2 px|ynq cn If all the ci are zero, these vectors are in the subspace N . Otherwise they are in a linear variety V

pa|yiq ci for all i

N

a, where

How do we know that the vector in V with minimum norm is of the form x0

°ni1 βiyi?

Write down a set of equations that can be solved for the βi . How could we solve this problem by Gram-Schmidt? 5. A finite-dimensional example You are doing the term project for your course “Ethnic Sculpture.” You start with a log of length 21 meters. In the first week, you cut off u meters, carve a totem pole, and set it vertically in Harvard Yard. In the second week, you cut off v meters, carve a totem pole, and set it vertically in Harvard Yard. In the third week, you carve a totem pole from the remaining w meters, and set it vertically in Harvard Yard. At the end of the fourth week, you take down the poles, having gained artistic renown equal to 3u 2v w. You need renown of 46 to get an A in the course. The energy cost of erecting a totem pole is proportional to the square of its height, so you want to minimize ||x||2 u2 v 2 w2 . Express the constraints in the form py |xq c.

The vectors y1 and y2 generate N K . Find the linear combination β1 y1 straints.

β2 y2 that satisfies both con-

What is the solution to this optimization problem? 6. An easy economic example To keep the numbers small, measure money in trillions of dollars and time in decades. You are Treasury secretary in the new Obama administration, and your task is to build up a fund of 3 e 2 trillion dollars over the next decade to deal with the financial disaster that will be caused when the current crop of Harvard undergraduates gets busy on Wall Street. By selling off formerly toxic securities you already have a trillion dollars to start with. By making loans to banks, you could get your fund balance to grow at an annual rate of 5 percent compounded continuously, in accordance with the differential equation

72 U

4. H ILBERT S PACE

x9 ptq 0.5xptq. That will increase your money to from Congress at a rate uptq.

?e trillion in a decade, but it is not enough. You will need funds

Constraints: xp0q 1, xp1q e 2 . 3

Optimization problem: choose uptq to minimize the Hilbert space norm pu|uq solution lies in an infinite-dimensional vector space.

³1 0

u2 ptqdt. The

Secret: the codimension is 1. If we can express the constraint in the form pu|y q c, then the solution is that uptq is in the space generated by y ptq, and we just have to determine one coefficient. First you will have to learn something about linear differential equations that Luenberger takes for granted. What inhomogeneous differential equation does xptq satisfy? What is a solution to the corresponding homogeneous differential equation? Substitute xptq a9 ptq.

aptqx0ptq, where x0ptq is a solution to the homogeneous equation, and solve for

Find an expression for ap1q ap0q as an integral.

Express the constraint xp1q e 2 in terms of an inner product pu|y q c. 3

The optimal uptq lies in the space N K generated by y. Use the constraint to find the coefficient, and the problem is solved! 7. An example from optimal control. Let H L2 r0, 1s, which is infinite-dimensional. We have a door that is supposed, at the push of a button, to open through an angle of one radian and come back to rest in one second. The door is controlled by an electric motor. By supplying a current uptq P H, which produces a proportional torque, we can overcome a linear damping force and also cause some angular acceleration. With all physical constants set equal to 1 for simplicity, the angular velocity ω of the door satisfies the differential equation ω9 ptq

ω ptq uptq.

Now we insist that the vector uptq must lie in the linear variety V of functions that cause the motor to rotate through an angle of 1 radian in one second, starting at rest and ending at rest. What small change would make V be a subspace N ? What is the dimension of N and of V ? The secret to solving this problem is that the codimension of N is 2. To prove this, we have to find basis vectors y1 and y2 that are orthogonal to N , so that ³

py1|uq 01 y1ptquptqdt ωp1q 0 ³ py2|uq 01 y2ptquptqdt θp1q 1.

We want to find the vector of minimum norm in V , the function uptq that minimizes the energy ³1 consumption 0 uptq2 dt. This will lie in the two-dimensional space spanned by y1 and y2 .

4.5. Fourier Series U 8. More fun with differential equations

First solve the homogeneous first-order equation ω9 ptq

73

ω ptq 0.

Now multiply this solution by aptq (“variation of parameters”) to get a trial solution for the inhomogeneous equation ω9 ptq ω ptq uptq. Plug in the trial solution to get a formula for a9 ptq. Integrate once to express ap1q and ω p1q in terms of uptq. Integrate the original equation, in the form ω9 ptq ω p1q, θp1q, and uptq.

θ9ptq

uptq, from 0 to 1 to get a relation among

What are the vectors y1 and y2 ? 9. Finishing the solution

By the general theorem for the dual approximation problem, the vector uptq of minimum norm lies in the space generated by y1 ptq and y2 ptq. We could find the minimum-norm uptq β1 y1 ptq β2 y2 ptq by solving two linear equations for β1 and β2 . However, there is a simpler basis for the space: the functions 1 and et . So uptq α1

α2 e t .

Use the constraints ω p1q determine uptq.

0 and θp1q 1 to get two equations for α1 and α2, and solve them to

10. Minimum distance to a convex set Replace the linear variety V by a closed convex subset K of a Hilbert space H, and let x be an arbitrary vector outside of K.

We can prove (Theorem 1 of section 3.12) that there exists a unique vector k0 P K of minimum distance from x. Orthogonality is too much to ask for, but we can also prove that

px k0|k k0q ¤ 0 for all k P K.

Draw a diagram, where H is two-dimensional, to show how

px k0|k k0q 0 can fail to be achieved.

Existence proof: this is almost the same as in the case where we have a subspace M instead of a convex set K. We use the parallelogram-law trick. Let δ

kinf ||x k|| PK

and construct a sequence such that ||x ki || converges to δ and so is Cauchy. Show that ki is Cauchy. Why is ki also convergent? What does this prove?

To prove uniqueness, assume that ||x k0 || ||x k1 || δ. Luenberger just does a special case of the argument above, but for a little more fun, just repeat the parallelogram-law argument. Now let’s assume that k0 has the property that

px k0|k k0q ¤ 0 for all k P K.

It is then easy to prove that k0 is the vector that minimizes the distance ||x k ||. Just expand ||x k||2 ||x k0 k0 k||2 in terms of dot products. This proof doesn’t even use the convexity of k.

74 U

4. H ILBERT S PACE

11. A proof that requires convexity

Suppose that k0 is the unique minimizing vector. The task is to show that px k0 |k1 k0 q ¤ 0 for all k1 P K.

Write a formula for the point kα that interpolates between k0 and k1 , and draw a diagram to show why this point is in K for 0 ¤ α ¤ 1. Since α 0 minimizes ||x kα ||, the quantity

||x kα||2 is a nondecreasing function of α. Its derivative cannot be negative, even for α 0. Use

this fact to complete the proof. 12. Example where the convex set is a cone This is a three-dimensional version of Luenberger’s example 1 on page 71. You are piloting a helicopter, and your mission is to deliver a sports reporter to the playing field of the world outdoor curling championships, which are being held on a large frozen lake in Saskatchewan. At noon, two snowmobiles set out from the origin along the ice, moving with velocity y1 and y2 respectively. The convex cone α1 y1 α2 y2 between their tracks becomes the playing field.

Sketch the playing field, and describe the four different types of location for the position x ˆ α1 y1 α2 y2 on the playing field that is closest to the position x of your helicopter. We know that for all k on the playing field, px x ˆ|k x ˆq ¤ 0.

Whatever the solution, you can always move to a new point in the cone along a snowmobile velocity vector. What does this say about k x ˆ yi ? If αi ¡ 0, you can also move to a new point in the cone backwards along a snowmobile velocity vector. What does this say about k x ˆ yi ?

13. Curling concluded We now have Luenberger’s general solution to this problem: x ˆ α 1 y1

α2 y2 , where for all i

px xˆ|yiq ¤ 0, with equality if αi ¡ 0. Equivalently

px xˆ|yiq zi, with αizi 0 Show how to recast this equation, for the two-dimensional case, in terms of the Gram matrix G. Alas, there are four unknowns even in this simple case! Give an interpretation of z for each of the four types of solution to the curling-competition problem.

C HAPTER 5 DUAL SPACES AND THE HAHN - BANACH THEOREM

Reading: [1, Sections 5.1 – 5.4, 5.12]

5.1

Linear Functionals

Definition 5.1.1 (functional). A transformation or function f : X

ÑY

R.

is a functional if Y

Definition 5.1.2 (linear functional). A linear functional is a functional f : X af pxq bf pyq for all x, y P X and a, b P R.

Ñ R such that f pax

byq

In a Hilbert space H, any vector y determines a linear function by the rule fy pxq py, xq. Example 5.1.3. What linear functionals are determined by

p1, 1, 1q on E 3? y2 1, 12 , 13 , . . . on l2 ? y3 ptq et1 on L2 r0, 1s?

• y1 • •

Solution: We have fy1 pxq py1 , xq x1

x2

x3 .fy2 pxq

py2, xq

8 1 ¸

i 1

i

yi fy3 pxq py3 , xq

»1 0

et1 xptq dt.

q

¨ Recall that for the case p 2, q 2, the Holder inequality (Quiz Theorem 5) reduced to the CauchySchwarz inequality (Propositions 3.1.2, 4.1.2), which in the inner product form was px, yq2 ¤ px, xq py, yq. The following example will motivate some of the calculations in the duality proofs in this chapter. Example 5.1.4. If x tξi u P l4{3 does

°8

1{2 ξi converge?

i 1i

¨ Solution: Yes. By the Holder inequality,

8 1 ¸

1{2 i

i 1

ξi

¤

8 1 ¸

1{2 i

i 1

|ξi| ¤

8 ¸

i 1

1

4 1{4

i1{2 75

}x}4{3

π2 6

1{4

}x}4{3 8.

q

76 U

5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM

Definition 5.1.5 (norm of a linear functional). A linear functional f on a normed space X is bounded if there is a constant M such that |f pxq| ¤ M }x} for all x P X. The smallest such constant M is the norm of f , denoted }f }; that is, }f } inf tM : |f pxq| ¤ M }x} for all x P X u.

Q UIZ T HEOREM 11 ([1], Proposition 2, p. 104). A linear functional on a normed space is continuous if and only if it is bounded. Proof: First we show that a linear functional f on a normed space X is continuous at a single point, then it is continuous throughout X. Assume f is continuous at x0 P X, and let txn u Ñ x be a sequence in X. By the linearity of f , |f pxnq f pxq| |f pxn x x0q f px0q|.

Since xn x x0 Ñ x0 and since f is continuous at x0 , f pxn x x0 q Ñ f px0 q. Thus, |f pxn q f pxq| Ñ 0, so f is continuous at x. Since this holds for any x P X, f is continuous throughout X. Suppose }f } ¤ M . Then if xn Ñ 0, |f pxn q| ¤ M }xn } Ñ 0, so f is continuous at x 0. By the above result, f is continuous everywhere in X. Now suppose that f is continuous anywhere in X, specifically at x 0. Then there exists a δ ¡ 0 such that |f pxq| 1 for }x} ¤ δ. For any x 0 P X, δx{}x} has norm δ, so by using the linearity of f ,

|f pxq| so M

δ f x

}}

}x} x }x} f δ x }x} , δ

δ

x

δ

1{δ is a bound for f .

q

Note that a linear functional on a finite-dimensional normed space must be bounded and is therefore continuous. To see why the extension to infinite-dimensional vector spaces poses a problem, consider the functional f px q

8 ¸

kξk

k 1

in l2 : clearly this function is discontinuous at x 0 since limxÑ0 f pxq 0 if we let ξk 1{kn and take the limit as n Ñ 8, but limxÑ0 f pxq is undefined if we let ξk 1{n. The linear functionals themselves may be regarded as elements of a vector space: Definition 5.1.6 (algebraic dual). Given a vector space X, not necessarily normed, the linear functionals on X form a vector space defined as follows: • Given two linear functionals f1 and f2 on X, we define their sum f1 X given by pf1 f2 qpxq f1 pxq f2 pxq.

f2 as the linear functional on

• Given a linear functional f and a scalar a, we define the scalar product af as the linear functional on X given by paf qpxq af pxq. • The null element in the space of linear functionals is the functional f pxq 0 for all x P X. This vector space is the algebraic dual of X.

5.2. Common Dual Spaces U

77

However, not all linear functionals in the algebraic dual will be bounded, so we restrict our attention to the subspace of the algebraic dual consisting of all bounded, or equivalently continuous, functionals on a normed space X. The norm of a functional f in this subspace is still Definition 5.1.5. In fact, we can write the norm of a functional in several ways:

}f } inf tM : |f pxq| ¤ M }x} for all x P X u M sup |f}pxx}q| x 0 sup |f pxq|

}x}¤1

sup |f pxq|.

}x}1

This leads to the following definition: Definition 5.1.7 (normed dual). Let X be a normed linear vector space. The space of all bounded linear functionals on X is the normed dual of X, denoted X . The norm of an element f P X is

}f }

sup |f pxq|.

}x}¤1

If we explicitly treat a functional f as an element of the dual space, we can write f x . Then f pxq x pxq xx, x y, where the symmetric notation x, y suggests that we are generalizing the inner product p, q from Hilbert space.

5.2

Common Dual Spaces

T HEOREM 5.2.1 ([1, Theorem 5.2.1, p. 106]). X is a Banach space. Proof: By definition of the algebraic dual (Definition 5.1.6), X already satisfies the axioms of a vector space. Therefore, we simply need to show that X is complete. Let tx u be a Cauchy sequence in X , so }xn xm} Ñ 0 as n, m Ñ 0. Then for any x P X, txnu is a Cauchy sequence since |xnpxq xmpxq| ¤ }xn xm}}x}. Therefore, since R is complete, there exists a scalar xpxq such that xnpxq Ñ xpxq. We must check that x P X ; that is, the functional x is linear and bounded. x is linear since x pax

byq lim xn pax

Ñ8

n

byq a lim xn pxq

Ñ8

n

b lim xn pyq ax pxq

Ñ8

n

bx pyq.

Furthermore, since txn u is a Cauchy sequence, for all ¡ 0, there exists an M such that |xn pxq xm pxq| }x} for n, m ¡ M and all x. Since xn Ñ x , |x pxq xm pxq| }bx} for m ¡ M . Thus,

|xpxq| |xpxq xmpxq so x is bounded.

xm pxq| ¤ |x pxq x pxq|

|xmpxq| ¤ p }xmq}x}, q

78 U


Guided by this theorem, we derive representations of the duals of some common dual spaces. These representations will allow us to apply the theoretical framework of dual spaces to practical problems. To find the dual of a normed space X, we start with a space whose elements are bounded linear functionals f on X. We can calculate }f } from Definition 5.1.5. The challenge is to find a “representation” for each functional by identifying it with an element }x } of a familiar normed space. For example, we can identify any linear functional on E 2 with a vector ? of coefficients pa, bq, combined with the evaluation formula ax by and the Euclidean norm }pa, bq} a2 b2 . By analogy with the inner product, we use xx, x y to denote application of the evaluation rule. Usually this is a “sum of products” or“integral of products” rule. The hard part is usually to show that the norm }f } is equal to the norm }x } of the representation, which was defined independently. The proofs in general have four steps: 1. evaluation: state the rule for using x to evaluate the bounded linear functional f . 2. construction: state the rule for converting the bounded linear functional f to x . 3. alignment: invent a vector x P X that is aligned with a given representation vector x . “Aligned” means that evaluating f pxq is essentially the same calculation as evaluating ||y ||||x||. If ||x|| 1, so much the better. Now we know that ||f || ¥ ||y ||. 4. The “inequality” step: Prove, usually by invoking something like the Hoelder inequality, that |f pxq| ¤ ||x||||y||. Applied to vectors with ||x|| 1, this guarantees that ||f || ¤ ||y||, so we conclude }f } }x}. 1. The dual of E n is E n This is easy, but it is a good introduction to what needs to be proved. • Evaluation: show how to use a vector y

pη1, η2, ηnq to define a bounded linear functional.

• Construction: give the rule for converting a bounded linear functional f to its representation y. • Alignment: Show that the choice x y guarantees that ||f || ¥ ||y || • Inequality: show that if y represents f , then ||f || ¤ ||y ||

2. The normed dual of cn is a subspace of l1 . The notation cn is not standard. It means Rn with the norm that is used in l8 . There are several spaces that use this maximum absolute value norm. l8 includes all bounded infinite sequences. c is the subspace of sequences that converge to some limit. c0 is the subspace of sequences that converge the limit 0. cn is the subspace of sequences with n components.

5.2. Common Dual Spaces U • Evaluation: show how to use a vector y tional.

79

pη1, η2, ηnq P l1 to define a bounded linear func-

• Construction: give the rule for converting a bounded linear functional f to its representation y. • Alignment: Choose x P cn with ξi sgn ηi . Show that this choice makes |f pxq| ||x||8 ||y ||1 . What goes wrong if we try to do this for x P l8 ? Show that ||f || ¥ ||y ||1 .

• Inequality: show that if y represents f , then f pxq x, y

¡¤ ||x||||y||1, so that ||f || ¤ ||y||1.

Conclusion: ||f || ||y ||1 3. The normed dual of lp is lq (Part of this is proof 2.5) As you would guess from the Hoelder inequality, • Evaluation: If y f is bounded?

1 p

1 q

1. First we do the proof for 1 p 8.

tηiu P lq represents f and x tξiu P lp, what is f pxq? How do we know that

• Construction: Given that f is linear and bounded, define y by ηi f pei q. Show that if a vector xN has only its first N components nonzero, then f pxN q xN , y ¡ On what basis can we extend this result to any x P lp ?

• Alignment: fix N and construct the vector xN whose first N components are given by

ξi |ηi | p sgn ηi and whose other components are all 0. Since there are only a finite number of nonzero components, we need not worry about convergence. q

Calculate ||xN || and |f pxN q|.

By the definition of the norm, |f pxN q| ¤ ||f ||||xN ||. Prove that ||f || ¥ ||y ||q . Given that f is bounded, what can we conclude about y?

tηiu P lq . What linear functional f does this define? Use the Hoelder inequality to show that |f pxq| ¤ ||x||p ||y ||q . Why can we conclude that ||f || ¤ ||y ||q ?

• Inequality: Start with the vector y

Conclusion: ||f || ||y ||q ? 4. The special case p 1 • Evaluation: same as for p ¡ 1

• Construction: same as for p ¡ 1 • Alignment: Choose a vector xN with at most one nonzero entry, xN = sgn ηN . What is the norm ||xN ||? Show that ||f || ¥ ||y ||8 .

Given that f is bounded, what can we conclude about y?

• Inequality: Start with y tηi u P l8 , define the obvious linear functional f , and use the Hoelder inequality to prove that ||f || ¤ ||y ||8 . You can then conclude that ||f || ||y ||8 .

80 U


5. The special case p 8 The obvious guess, that the dual space is l1 , is wrong. The problem is related to the fact that l8 is not separable. It is OK for a separable space (l1 ) to have a nonseparable dual, but not the other way around.

If you take the subspace c0 l8 of infinite sequences that converge to zero, which is separable, then the dual space is l1 . The proof is left to the homework. Consider the subspace c l8 of infinite sequences that converge to some value ξ. This space is also separable, and its dual space is l1 . The proof is left to the homework. The trick is to notice that once you evaluate f on the vector tξ, ξ, ξ, , you can use linearity to reduce this problem to the previous one. Here is the problem with l8 .

One element is the sequence p0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, q where ξi 1 if i is prime, 0 otherwise. Another is the sequence p1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, q. Since f is bounded, it has a finite value when evaluated on any such sequence. There are uncountably many such sequences. If the dual space of l8 were l1 , it would be possible to identify a countable subset S of these sequences, evaluate f on that subset, and then use those values to evaluate f on any sequence x. But f is bounded, hence continuous. To find a y P S such that |f pxq f py q| , we need to make ||x y|| δ. If that could be done, then S would be dense, and l8 would be separable, which we know is not the case. 6. Dual of Hilbert space The obvious guess, that any Hilbert space H is its own dual, is correct. This is theorem 2 on page 109. • Evaluation: any element y P H defines a bounded linear functional by the rule f pxq How do we know that this function is bounded?

px|yq.

• Construction: First assume a bounded linear functional f . We want to find a vector y such that f pxq px|y q. Consider the set N of all vectors n such that f pnq 0. Show that this set is a closed subspace. If N H, then f is the zero functional and we take y θ. Otherwise, write H N ` N K . Choose a vector z P N K with f pz q 1. Consider any x P H. Show that x f pxqz P N . Use the fact that (x f pxqz |z q 0 to construct y. Show that y is unique.

• Alignment: choose x

||y|| . Show that ||f || ¥ ||y||. y

• : Inequality: use Cauchy-Schwarz to show that ||f || ¤ ||y ||. We conclude, as usual, that ||f || ||y ||. 7. Dual of Lp r0, 1s for p ¡ 1 Luenberger just mentions “arguments similar to those in Theorem 1.”

The obvious guess, that the dual space is Lq r0, 1s, is correct. What is not entirely obvious is how to construct the representation y ptq from the functional.

5.2. Common Dual Spaces U • Evaluation: given y ptq P Lq r0, 1s and xptq P Lp r0, 1s, define a functional by f pxq How do we know that it is bounded?

³1 0

81

xptqy ptqdt.

• Construction: Define the function us ptq 1 for 0 ¤ t s, 0 otherwise. For future use, evaluate the norm of v ptq us h ptq us ptq. Given a bounded linear functional f , define F psq for some M .

f pusq Prove that F ps

hq F psq

Mh

1 p

We now know that F psq is continuous. We cannot prove that it is differentiable. However, we can prove that it is “absolutely continuous.” There are theorems, beyond the scope of our grasp of Lebesgue integration, that say that an absolutely continuous function is differentiable except on a set of measure zero. Now, wherever F is differentiable, take y ptq F 1 ptq as the function that represents f . The values of y on the set of measure zero where F is not differentiable are ³1 irrelevant to the value of the Lebesgue integral f pxq 0 xptqy ptqdt. So we have identified the bounded functional f with an equivalence class of functions.

• Alignment: choose

xptq |y ptq| p sgn y ptq. q

•

Calculate f pxq and ||x||p by evaluating integrals. Conclude that ||f || ¥ ||y||q . Since f is bounded, this will establish that yptq P Lq r0, 1s. Inequality: use Hoelder to show that, if y ptq represents f , then f pxq ¤ ||x||p ||y ||q . How does this establish that ||f || ¤ ||y ||q ?

We are done: we can convert between f and y, and we have proved that ||f || ||y ||q . 8. Minkowski functionals This topic comes much later in the chapter, but it provides a geometric motivation for the sublinear functionals that appear in the Hahn-Banach theorem. Consider a convex set K in a normed space X that has the origin θ in its interior. You can think of the boundary of K as the set of all points that can be reached from the origin for unit taxicab fare. Then the Minkowski functional ppxq is the cab fare from the origin to x or equivalently, the factor by which x must be scaled down to end up on the boundary of K. More formally, ppxq inf tr :

x r

P K, r ¡ 0u.

Draw a diagram showing curves where ppxq 1, 2, 12 . A Minkowski functional is “sublinear”: it has some but not all properties associated with norms. The first two are the defining properties of a sublinear functional. • ppαxq αppxq for α ¥ 0. Prove this from the definition. • p px 1

x 2 q ¤ p px 1 q

ppx2 q. Use the convexity of K to prove this. (This is proof 2.6)

• ppxq ¥ 0, and ppxq is finite for all x. This additional property gives us a “convex functional.” It is obvious from the definition.

• ppxq is continuous. Prove this first at θ, then use sublinearity to extend the proof to an arbitrary x. We will consider only continuous sublinear functionals.

82 U


9. Statement of the Geometric Hahn-Banach theorem This looks obvious in two or three dimensions, but we will need the more obscure “extension form” of the theorem in order to prove it. Let K be a convex set with nonempty interior in a real normed vector space X. There is no loss of generality in assuming that θ is in the interior. Suppose that V is a linear variety in X containing no interior points of K. Note that V can be as small as just a single point, and that it can include points in the boundary of K. Then there exists an element x

P X and a constant c such that

v, x ¡ c for all v P V (this defines a hyperplane H containing V ) k, x ¡ c for all k in the interior of K (the hyperplane does not intersect the interior of K).

Draw diagrams to illustrate the following: If K is not convex, the conclusion of the theorem may not hold. The hyperplane H may or may not be unique. Also illustrate the feature that if X is 3-dimensional, V may have dimension 0, 1, or 2. 10. The theorem in two dimensions. Suppose that V is a zero-dimensional variety (just a single point x). Construct the subspace M (a line through the origin) that contains x. It is easy to construct a linear function f pxq on M such that f pxq 1. Since x is not in the interior of K, ppxq ¥ 1. Draw a diagram to illustrate this setup.

Now consider α ¡ 0. Show that f pαxq ¤ ppαxq.

Suppose α 0. Show that f pαxq ¤ ppαxq

At this point we need a way to extend the definition of f pxq to the entire space X while preserving the property that f pαxq ¤ ppαxq. That is exactly what the extension form of the Hahn-Banach theorem will do for us. 11. Proof of the Hahn-Banach theorem(extension form) (proof 2.7)

On subspace M , we have a linear functional f pmq and a continuous sublinear functional p (defined on all of X) satisfying f pmq ¤ ppmq for all m. Choose a vector y that is not in M . Our immediate goal is to define a linear functional g on the space rM y s such that g pmq f pmq on M and g pxq ¤ ppxq on this space that is one dimension larger. Draw a diagram to illustrate the case where X is the plane, M is a line, and p is the Minkowski functional of a convex set. Show that setting g py q 0 will not necessarily work, but that it appears possible to choose a vector m0 P M such that setting g py m0 q 0 will work. Show an example where only one value for m0 is possible, and show other examples where many different choices will do the job. Now consider arbitrary m1 , m2 P M . For fixed y, we require g pm1 y q ¤ ppm1 y q and g pm2 y q ¤ ppm2 y q. Exploit the linearity of g to get a pair of inequalities involving the value g py q that we need to choose.

Starting with f pm1 m2 q ¤ ppm1 m2 q, show that there exists at least one acceptable value for g py q. Luenberger, on page 111, calls this value c.

5.2. Common Dual Spaces U Using the inequality g pm show that g pm

αy q ¤ ppm

y q ¤ ppm

83

y q, which holds for all m P M given our clever choice of g py q,

αy q for all α ¡ 0.

Using the inequality g pm y q ppm βy q for all β ¡ 0.

¤ ppm yq, which also holds for all m P M , show that gpm βyq ¤

What about α 0?

We have now succeeded for one more dimension, since we have chosen g py q in such a way that g pm αy q ¤ ppm αy q for all α and all m, and any vector in rM y s is of the form m αy. 12. Finishing the proof At this point, if X is finite dimensional, an extension to all of X follows easily by induction, since given f on a k-dimensional M , we have shown how to construct an extension g on a (k 1)dimensional rM y s. However, we want to use this theorem in the infinite-dimensional case, so there is still work to be done. Being wimps who are not “familiar with Zorn’s lemma” (Luenberger, p. 111), we assume that X is separable. From the countable dense subset x1 , x2 , , xn , we extract a subset y1 , y2 , , yn , of linearly independent vectors, none of them in M . Now M and these yi generate S, a dense subset of X. One y at a time, we extend functional f to a g that is defined on S. By construction, g continuous?

¤ p.

Furthermore, p is bounded and continuous. How do we know that g is

Now consider an x that is not in the dense subspace S. How do we define F pxq on all of X so that F px q ¤ p px q ?

If you are keen on Zorn’s lemma, see Bachman and Zarici, Functional Analysis, pp.179-180, for a proof that does not assume separability. 13. Back to the geometric version (proof 2.8) Review the situation with the aid of a diagram. We have a convex set K with θ in its interior. V is a linear variety in X that includes no interior points of K, although it may include boundary points. We let M be the subspace of X generated by V . Then there is a linear functional f , defined on M , which has the value 1 on the variety V . Since V contains no interior points of K, the Minkowski functional of K, ppxq, satisfies f pxq 1 ¤ ppxq on V . Prove that f pxq ¤ ppxq for all m P M .

Use Hahn-Banach to extend f pxq to an F pxq defined on all of X. How can we be sure that F is continuous?

tx : F pxq 1u is closed and contains no interior points of K. Restate the conclusion in terms of an element x P X .

Show that the hyperplane H

14. The norm as an all-purpose sublinear function - Corollary 1 on page 112. Show that any norm ||x|| is a sublinear function.

Given a bounded linear functional f defined on subspace M X, with norm ||f ||M , prove that it can be extended to a bounded linear functional F , with the same norm, defined on all of X. The trick is to choose ppxq ||f ||M ||x||.

84 U


A really simple example: In E 2 (Euclidean norm), let M be the line u M . What is the extension F pu, v q to all of E 2 ?

v and define f pt, tq 2t on

5.2. Common Dual Spaces U

85

15. Achieving alignment

Show (Corollary 2) for any x P X, how to construct a nonzero bounded linear functional F , defined on all of X, such that |F pxq| ||F ||||x||. In R2 , choose the Somali taxicab norm ||x|| ||pu, vq|| 12 |u| |v|. Show the set K for which ||x|| ¤ 1.

Geometric form: As V choose the point x does not intersect the interior of K.

p1, 21 q.

Find a hyperplane H that includes V but that

Extension form(corollary 1): Define f on the subspace generated by x as f p2t, tq 2t. Extend f to a function F pu, v q with the same norm. Extension form(corollary 2): Find a linear functional, defined on all of R2 , such that |F pxq| ||F ||||x||.

86 U


C HAPTER 6 APPLICATIONS OF THE HAHN - BANACH THEOREM

Reading: [1, Sections 5.5 – 5.9, 5.11 – 5.13]. Also see [7, Chapter 2] for more on the Riesz representation theorem.

6.1

The Dual of C ra, bs

Recall that C ra, bs is the Banach space of real-valued (uniformly) continuous functions x on ra, bs with norm }x} suptPra,bs |xptq|. The challenge is to find a way of representing all linear functionals on C ra, bs in a unified manner without using Dirac delta functions. This can be done elegantly by using the Stieltjes integral. Definition 6.1.1. Let f and g be real-valued, bounded functions on ra, bs that do not have a common point of discontinuity. Take a partition of the interval a t0 tn b. Then if I

n¸1

maxpt limt qÑ0 i 1

i

f pti1 qpg pti q g pti1 qq

i 0

exists, I is called the Stieltjes integral of f with respect to g, denoted »b a

f ptq dg ptq.

Example 6.1.2. Suppose that you use a logarithmic t-axis in presenting a graph of function f : i.e. you graph f plog sq. Now the area under the graph no longer represents the integral. Show that the Stieltjes ³b integral gives the right formula for approximating the integral a f psq ds and that this formula agrees with the change-of-variable formula.

Solution: The area under the graph is » exp b exp a

f plog sq dplog sq

where we have the partition a s0

p

max si

» exp b exp a

n¸1

lim 1

si qÑ0 i0

f plog si1 qplog si log si1 q

1 f plog sq ds s

» exp b exp a

f psq ds,

sn b, which agrees with the change-of-variable formula. 87

q

88 U

6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

Example 6.1.3. What is

³1 0

f ptq dg ptq if:

• g ptq t? • g is differentiable? • g ptq 0 for t 1{2 and g ptq 1 for t ¥ 1{2? • g ptq t for t 1{2 and g ptq t • g pt q P p T

2 for t ¥ 1{2?

¤ tq for random variable T with support r0, 1s?

Solution: We have • • • • •

³1

³1

0

f ptq dg ptq

0

f ptq dg ptq

0

f ptq dg ptq f p1{2q

0

f ptq dg ptq

0

f ptq dg ptq E rT s

³1 ³1 ³1 ³1

0

f ptq dt

0

f ptqg 1 ptq dt

³1

³1 0

f ptq dt

f p1{2q

q

The Riesz representation theorem says that any bounded linear functional f on C ra, bs can be expressed as a Stieltjes integral f px q

»b a

xptq dv ptq.

However, there is one caveat: the function v must be of bounded variation. Loosely speaking, TVpv q a |dv ptq| must be finite. For example, we could choose xptq sgn v ptq so that f pxq TVpxq. However, such an x is clearly not in C ra, bs since it is not continuous, so we must introduce the following normed linear space.

³b

Definition 6.1.4. Let B ra, bs be the space of bounded functions x on ra, bs with norm }x} suptPra,bs |xptq|. Note that C ra, bs is a subspace of B ra, bs. Also recall that we defined the normed linear space BVra, bs of functions on ra, bs with norm TVpxq in Example 2.4.6. Now we can state the Riesz represetnation theorem. T HEOREM 6.1.5 (Riesz Representation Theorem ([1], Theorem 1, p. 113)). Let f be a bounded linear functional on C ra, bs. Then there is a function v P BVra, bs such that for all x P C ra, bs, f px q and }f } TVpv q. Conversely, every function v

»b a

xptq dv ptq

P BVra, bs defines a bounded linear functional in this way.

6.1. The Dual of C ra, bs U

89

Proof: Since C ra, bs is a subspace of B ra, bs, if f is a bounded linear functional on C ra, bs then, by the Hahn-Banach theorem, there exists a linear functional F on B ra, bs which is an extension of f and has the same norm. For any s P ra, bs, define the set of functions tus : s P ra, bsu, where for all t P ra, bs, us ptq 0 if s a and us ptq 1tPra,ss if s P pa, bs. Clearly, each us is in B ra, bs. Also define v psq F pus q, and let a t0 tn b be a finite partition of ra, bs. First we show that TVpv q ¤ }f }. Define i n ¸

sgnpvptiq vpti1qq for i 1, . . . , n. Then

|vptiq vpti1q|

i 1

n ¸

i pv pti q v pti1 qq

i 1 n ¸

i pF puti q F puti1 qq

i 1

n ¸

F

i puti

ut q i 1

,

i 1

where the last step follows from the linearity of F . By the Hahn-Banach theorem, }F } }f }. By definition, }uti uti1 } 1 for only one i P t1, . . . , nu, so n ¸ i uti

p

i 1

uti1

q 1.

Therefore, combining these results with Corollary ??, n ¸

n ¸ F i uti

|vptiq vpti1q| ¤ } }

i 1

p

i 1

uti1

q }f }.

°

Then since TVpv q is the maximum of ni1 |v pti q v pti1 q| over all partitions of ra, bs and the above inequality holds for all partitions, TVpv q ¤ }f }. Now we show that }f } ¤ TVpv q. For any x P C ra, bs, define z pτ q

n ¸

xpti1 qputi pτ q uti1 pτ qq.

i 1

Then, again since }uti

ut } 1 for only one i P t1, . . . , nu, i 1

}z x} i,τ Prmax |xpti1q xpτ q|. t ,t s

i 1 i

Since x is uniformly continuous, this quantity goes to zero as the partition is made arbitrarily fine. Then since F is continuous, F pz q Ñ F pxq, which in turn equals f pxq by the Hahn-Banach theorem. Finally, F pz q

n ¸

xpti1 qpv pti q v pti1 qq Ñ

i 1

so combined with the previous result, f px q

»b a

xptq dv ptq.

»b a

xptq dv ptq,

90 U


Finally,

» b x t dv t a

p q p q ¤ }x}TVpvq;

dividing by }x} gives }f } ¤ TVpv q. Therefore, }f } TVpv q. Conversely, if v P BVra, bs, then f px q

»b a

xptq dv ptq

is linear. Furthermore, f is bounded since |f pxq| ¤ }x}TVpv q.

q

Note that the Riesz representation theorem does not claim that the function v representing the functional f is unique. For example, the functional f pxq xp1{2q could be represented by a function v ptq that equals zero for t P r0, 1{2q, any value a P r0, 1s for t 1{2, and 1 for t P p1{2, 1s. In fact, we could even shift v upward or downward by an arbitrary amount. Therefore, we introduce the following normed linear space, which we can use in applications. Definition 6.1.6. Let NBVra, bs be the space of functions v on ra, bs of bounded variation such that v paq 0 and v is right-continuous on pa, bq. The norm is }v } TVpv q. Note that NBVra, bs is a subspace of BVra, bs.

6.2

The Second Dual Space

Let the functional g pxq be represented by some element x P X ; we will use the notation g pxq xx, x y. Conversely, any x P X defines a functional on X by f px q xx, x y. Example 6.2.1. Show that such an f is linear and bounded. Solution: f is linear because f pax1

bx2 q xx, ax1

bx2 y a xx, x1 y

b xx, x2 y af px1 q

bf px2 q.

Also, |f px q| | xx, x y | ¤ }x}}x }, so }f } ¤ }x}. Furthermore, by Corollary ??, there exists a nonzero x P X such that xx, x y }x}}x }, so in fact }f } }x}. q Definition 6.2.2 (second dual). The space of all bounded linear functionals on X is denoted X and is called the second dual of X. Definition 6.2.3 (natural mapping). The mapping φ : X Ñ X defined by φpxq x such that xx, x y xx, xy is called the natural mapping of X into X ; that is, φ maps elements of X into the functionals they generate X .

Definition 6.2.4 (reflexivity). A normed linear space is said to be reflexive if X natural mapping is surjective.

X ; that is, if the

6.3. Alignment and Orthogonal Components U

91

Example 6.2.5. Give examples of spaces that are reflexive and not reflexive. Solution: The lp spaces, with 1 p 8, are reflexive since lp lq with 1{p 1{q 1, as shown in Quiz Theorem ??, so lp lq lp . l1 is not reflexive since the dual of l1 is l8 but the dual of l8 is not l1 . By the Riesz-Fréchet theorem, any Hilbert space is reflexive. q

6.3

Alignment and Orthogonal Components

Recall that in a Hilbert space H, px, yq }x}}y} if and only if y this idea below. Definition 6.3.1 (alignment). A vector x }x}}x}. Example 6.3.2. Let x tξi u P lp and x condition are x and x aligned?

ax for some constant a. We generalize

P X is said to be aligned with a vector x P X if xx, xy

tηiu P lq , with 1 p 8 and 1{p

1{q

1.

Under what

¨ Solution: The condition for alignment is simply the condition for equality in the Holder inequality:

|ξi| |ηi|

}x}p }x}q 1 q

Example 6.3.3 ([1], Example 1, p. 117). Let x P Lp ra, bs and x Under what condition are x and x aligned?

1 p

q

.

P Lq ra, bs, with 1 p 8 and 1{p 1{q 1.

¨ Solution: As before, the condition for alignment is simply the condition for equality in the Holder inequality: xptq K psgn y ptqq|y ptq|q{p . q Example 6.3.4 ([1], Example 2, p. 117). Let x P C ra, bs and let Γ tt P ra, bs : |xptq| ³b conditions is a bounded linear functional x pxq a xptq dv ptq aligned with xptq?

}x}u. Under what

Solution: We require that v vary only on Γ and that v is nondecreasing at t if xptq ¡ 0 and nonincreasing at t if xptq 0. q Recall that in a Hilbert space H, we defined the orthogonal complement of a set S as S K ps, yq 0 for all s P S u. We generalize this idea below.

ty P H :

Definition 6.3.5 (orthogonal complement). Let S be a subset of a normed linear space X. The orthogonal complement or annihilator space of S is defined as S K tx P X : xs, x y 0 for all s P S u.

92 U


Given a subset U X , U K X . However, U K is not necessarily a subset of X if X is not reflexive, so we define the more useful concept below. Definition 6.3.6 (orthogonal complement of U in X). Let U be a subset of the dual space X . The orthogonal complement of U in X is defined as K U U K X X. T HEOREM 6.3.7 ([1], Theorem 1, p. 118). Let M be a closed subspace of a normed space X. Then K rK M s M . Proof: Clearly, M K rM K s, or equivalently, if x P M , then x PK rM K s. We show the converse: if x R M , then x RK rM K s. On the subspace rx M s, define the linear functional f pax mq a for m P M . Then

}f }

sup

P

m M

f p x mq }x m}

inf

1

P

m M

}x

m}

.

Since x R M and M is closed, inf mPM }x m} 0, so }f } 8. Thus, by the Hahn-Banach theorem, we can extend f to an x P X . Since f pmq 0 for m P M , x P M K . But xx, x y 1, so x RK rM K s. Then since M K rM K s and M C pK rM K sqC , M K rM K s. q

6.4

Minimum Norm Problems

In general, the solutions to minimum norm problems are not unique. Example 6.4.1 (equivalent to [1], Example 1, p. 119). Cab drivers are charging only for the larger dimension traveled in R2 . Show that from a point pu, v q with v 0, there are many points pu1 , 0q that can be reached for the same fare. Solution: All the points pu1 , 0q with u1

P ru v, u

v s can be reached for fare v.

q

Example 6.4.2. Now cab drivers are charging for the total distance driven. Show that from a point pu, v q with u v, there are many points pu1 , u1 q that can be reached for the same fare. Solution: Without loss of generality, assume u reached for fare v u.

v.

Then all the points pu1 , u1 q with u1

P ru, vs can be q

The following theorem guarantees neither existence nor uniqueness, but it is more general than the projection theorem in that it specifies a dual problem for any minimum norm problem. T HEOREM 6.4.3. Let x be an element of a real normed linear space X. Then d inf

P

m M

where the maximum is achieved for some x0 with x m0 .

}x m} }x}¤max x x, x y , K 1,x PM P M K. If the infimum is achieved for some m0 P M , then x0 is aligned

6.4. Minimum Norm Problems U Proof: For ¡ 0, let m

P M satisfy }x m} ¤ d

. Then for any x

93

P M K such that }x} ¤ 1,

xx, xy xx, xy xloooooomoooooon m , x y xx m , x y ¤ }x m }}x } ¤ d

.

0 Since was arbitrary, we conclude xx, x y ¤ d. To complete the proof of the first part of the theorem, we must find a x such that xx, x y d. Let N rx M s. We can represent an element n P N uniquely as 0

ax

0

m, where m P M . Define the linear functional f on N by f pnq ad. Then

}f } sup |f}pnn}q| sup }ax|a|d n} inf

d 1. } x m{a} nPN nPN nPN By the Hahn-Banach theorem, extend f on N to x0 on X. Then }x0 } }f } 1 and x0 f on N , so by construction, x0 P M K and xx, x0 y f px mq d. To prove the second part of the theorem, assume there exists an m0 P M such that }x m0 } d. Let x0 be any element such that x0 P M K , }x0 } 1, and xx, x0 y d (the x0 constructed above is one possibility). Then so x0 is aligned with x m0 .

xx m0, x0 y xx, x0 y d }x m0}}x0 }, q

We can generalize the projection theorem as a corollary of Theorem 6.4.3. C OROLLARY 6.4.4. Let x be an element of a real normed linear space X and let M be a subspace of X. A vector m0 P M satisfies }x m0 } ¤ }x m} for all m P M if and only if there is a nonzero vector x P M K aligned with x m0 . Proof: If }x m0 } ¤ }x m} for all m P M , then we have achieved the infimum in Theorem 6.4.3, specifically at m0 , so the given x0 is aligned with x m0 . If there is a nonzero vector x P M K aligned with x m0 , take }x } 1 without loss of generality. Then for all m P M , xx, x y xx m, x0 y ¤ }x m} while xx, x y xx m0 , x0 y }x m0 }. Therefore, }x m0} ¤ }x m} for all m P M . q Example 6.4.5. Let X and x .

R2 with the l1 norm, M tpu, vq : u 2v 0u, and x p0, 3q. Determine d, m0,

Solution: We have M K rp1, 2qs, so K tx : }x } 1u tp1{2, 1qu. Then x0 arg maxx PK xx, x y p1{2, 1q and d xx, x0 y 3. Then x m0 p0, 3q since x m0 and x0 must be aligned, so m0 p0, 0q. q Example 6.4.6. Let X R2 with the l8 norm, M tpu, v q : u 2v 0u, and x p0, 3q. Determine d, m0 , and x . Note that the norm in X is the l1 norm since difficulties concerning the dual of l8 occur only because of infinite dimensionality. Solution: We have M K rp1, 2qs, so K tx : }x } 1u tp1{3, 2{3qu. Then x0 arg maxx PK xx, x y p1{3, 2{3q and d xx, x0 y 2. Then x m0 p2, 2q since x m0 and x0 must be aligned, so m0 p2, 1q. q

94 U


Example 6.4.7. Let X and x .

R2 with the l1 norm, M tpu, vq : 2u v 0u, and x p1, 2q. Determine d, m0,

Solution: We have M K rp2, 1qs, so K tx : }x } 1u tp1, 1{2qu. Then x0 arg maxx PK xx, x y p1, 1{2q and d xx, x0 y 2. Then x m0 p2, 0q since x m0 and x0 must be aligned, so m0 p1, 2q. q Theorem 6.4.3 simply proved the existence of a solution to the dual problem in X : the theorem below proves the existence of a solution to the minimum norm problem in X. T HEOREM 6.4.8. Let M be a subspace in a real normed space X. Then d min }x m } sup xx, x y , m PM K xPM,}x}1 where the minimum is achieved for some m0 is aligned with x0 . Proof: Define

P M K. If the supremum is achieved for some x0 P M , then x0 m0

}x}M

sup

P } }1

x M x

xx, xy

since it is the norm of the functional x restricted to the subspace M . Then for any m

}x m} ¥

sup xpx m q sup

}x}¤1

sup

P } }¤1

x M, x

}x}¤1

P M K,

pxx, xy xx, bmyq

pxx, xy xx, bmyq

sup

P } }¤1

x M, x

xx, xy ¥ }x}M .

Since m was arbitrary, we conclude }x m } ¤ }x }M . To complete the proof of the first part of the theorem, we must find a m0 such that }x m } }x }M . Consider x restricted to M . The norm of x so restricted is }x }M . By the Hahn-Banach theorem, extend x restricted on M to y on X. Then }y} }x}M and y x on M so y x 0 on M . Set m0 x y; then m0 P M K and }x m0 } }y} }x}M . To prove the second part of the theorem, assume there exists an x0 P M such that xx0 , x y d. Since the supremum is achieved, clearly }x0 } 1 and }x m0 } xx0 , x y xx0 , x m0 y, so x0 is aligned with x m0 . q Example 6.4.9. Let X and x0 .

R2 with the l1 norm, M tpu, vq : 2u v 0u, and x p4, 1q. Determine d, m0 ,

Solution: We have M rp1, 2qs, so K tx : }x} 1u tp1{3, 2{3qu. Then x0 arg supxPK xx, x y p1{3, 2{3q and d xx, x0 y 2. Then x m0 p2, 2q since x m0 and x0 must be aligned, so m0 p2, 1q. q We can illustrate the duality between Theorems 6.4.3 and 6.4.8 more explicitly as follows: given a normed linear space X and a subspace M , we have two ways of making a new vector space:

6.5. Applications U

95

• the quotient space X {M . An element of X {M is a coset rxs. Recall that the norm of such a coset is }rxs} inf mPM }x m}. Motivated by this definition, let }x}M }rxs}. This notation is reasonable if you think of it as answering the question: if I am allowed to add any element of M to x, how small can I make the norm? • the annihilator space M K , a subspace of X . If we view x as an element of X , it has norm

}x}M K

sup xx, xy K x PM ,}x }1

since this is the norm of x considered as a functional on M . Using this notation, Theorem 6.4.3 is }x}M

}x}M K and Theorem 6.4.8 is }x}M K }x}M .

Example 6.4.10. Let X R3 with the l1 norm, M tpu, v, wq : u 2v 3w that M rtp2, 1, 0q, p3, 0, 1qus. Determine d, m0 , and x0 .

0u, and x p7, 2, 3q. Note

Solution: We have M K rp1, 2, 3qs, so K tx : }x } 1u tp1{3, 2{3, 1qu. Then x0 arg maxx PK xx, x y p1{3, 2{3, 1q and d xx, bx0 y }x}M K 2. Then x m0 p0, 0, 2q since x m0 and x0 must be aligned, so m0 p7, 2, 1q. q We will solve minimum norm problems using roughly the following three steps: 1. In characterizing optimal solutions, use the alignment properties of the normed linear space and its dual. (Refer back to Examples 6.3.2, 6.3.3, and 6.3.4.) 2. Try to guarantee the existence of a solution by formulating the minimum norm problem in a dual space. 3. Examine the dual problem to see if it is easier – that is, if it is a lower-dimensional problem or is more transparent – than the original problem.

6.5

Applications

Q UIZ T HEOREM 12 (Tonelli ([1], Example 1, p. 122)). If f is continuous on ra, bs and p0 is the polynomial with deg p0 ¤ n minimizing maxtPra,bs |f ptq pptq|, then |f ptq p0 ptq| achieves its maximum at at least n 2 points on ra, bs. Proof: First, we reformulate the problem as follows: in the Banach space C ra, bs, we want to find p P N tpolynomials p : deg p ¤ nu such that }f p} is minimized. N is finite-dimensional (dimension n 1), so an optimal solution p0 must exist. Suppose d }f p0 } ¡ 0 and let Γ tt P ra, bs : |f ptq p0 ptq| du. By Theorem 6.4.3, the optimal solution p0 must be such that f p0 is aligned with an element v P N K with }v } 1. Furthermore, by Theorem 6.1.5, N K is a subspace of NBVra, bs. Therefore, recall from Example 6.3.4 that if f p0 is aligned with v, then v varies only on Γ; that is, v consists of jump discontinuities at the points in Γ. Now suppose that Γ contains m n 2 points a ¤ t1 tm ¤ b. Then the polynomial ± q ptq ik pt ti q with k P t1, . . . , mu has degree m 1 ¤ n and is therefore in N . The Stieltjes integral

96 U


³b

a q ptq dv ptq is a linear combination of q pt1 q, . . . , q ptm q with no zero coefficients since v consists of jump discontinuities at the points in Γ. However, the integral cannot equal zero since q pti q 0 for i k and q pti q 0 for i k, so v R N K . Therefore, f p0 is not aligned with any nonzero element of N K , so our assumption that Γ contains less than n 2 points was false. q

Example 6.5.1. Let ra, bs r1, 1s, f ptq t2 , and n 1. Find p0 ptq. Solution: Let p0 ptq a bt. By Theorem 12, |t2 a bt| achieves its maximum at at least 3 points on r1, 1s. The maximum of |t2 a bt| could occur at t 1 (d |1 a b|), t 1 (d |1 a b|), or t b{2 (d | b2 {4 a|). Because the coefficient on the t2 term (1) is positive, we know that 1 a b, 1 a b, and b2 {4 a will be positive for optimal p0 ptq. Setting these quantities equal to d gives a 1{2, b 0, and d 1{2. q Example 6.5.2. Let ra, bs r1, 1s, f ptq tk

1,

and n k. Find p0 ptq.

Solution: Notice that the function cosppn 1qθq achieves its maximum absolute value of 1 precisely n 2 times on the interval r0, π s, specifically at the points tkθ{pn 1q : k 0, . . . , n 1u. Now cos θ P r1, 1s, so set t cos θ and express cosppn 1qθq as a function of t to find p0 ptq. For example, cos 2θ

2 cos2 θ 1 2t2 1,

so the best approximation to f ptq t2 on r1, 1s is p0 ptq 1{2. As another example, cos 4θ

a

cos2 2θ sin2 2θ p2 cos2 θ 1q2 p2t 1 t2q2 p2t2 1q2 4t2p1 t2q 8t4 8t2 1,

so the best approximation to f ptq t4 on r1, 1s is p0 ptq t2 81 . Let Tn pcos θq cos nθ by the Chebyshev polynomials of the first kind. Note that putting t gives T0 ptq cos 0t 1 and T1 ptq cos θ t. We have the recurrence relation Tn

1

cos θ

ptq cosppn 1qθq cos θ cos nθ sin θ sin nθ cos θ cos nθ sin θpsin θ cosppn 1qθq cos θ sinppn 1qθqq cos θ cos nθ sin2 θ cosppn 1qθ cos2 θ cosppn 1qθq cos2 θ cosppn 1qθq sin θ cos θ sinppn 1qθq cos θ cos nθ cosppn 1qθq cosθ cos nθ 2t Tnptq Tn1ptq.

Then to determine the best approximation p0 ptq to f ptq tn on r1, 1s, find the Chebyshev polynomial Tn ptq, divide by the leading coefficient, the tn term, and multiply by 1. Then |f ptq p0 ptq| ! subtract )

reaches its maximum at the points t cos

k n 1π

:k

0, . . . , n

1 .

q

In the previous examples, we wanted to find the vector of minimum norm in a given linear variety instead of finding the vector closest to a given subspace. Remember that these problems are equivalent by Theorems 6.4.3 and 6.4.8. A standard problem is to find an element of minimum norm satisfying a finite

6.5. Applications U

97

number of linear constraints. Treat the solution to the minimum norm problem as a vector x in the dual space X . Theorem 6.4.8 then guarantees the existence of a solution to the minimum norm problem. We can express the constraints in the form xy1 , x y c1 , . . . , xyn , x y cn with yi P X for i 1, . . . , n. Then ¯ is a vector satisfying the constraints, we have if M rtyi us and x d

min

xyi , x yci

}x} mmin }x¯ m} PM K

sup

P } }¤1

x M x

xx, x¯ y ,

where the second equality follows from Theorem 6.4.8. Then because M is finite-dimensional (dimenson n), so we can change the supremum to a maximum. Any vector in M can be written as a linear combina° tion x ni1 ai yi or equivalently, if we set Y py1 , , yn q and a1 pa1 , , an q, as x Ya. Then if we set c1 pc1 , , cn q, d

min

xyi , x yci

¯ y max c1 a. }x} }Ya max xYa, x }¤1 }Ya}¤1 °

Then by Theorem 6.4.8, the optimal x is aligned with x ni1 ai yi (compare to the equivalent result in ° Hilbert space, where the optimal solution was proportional to x ni1 ai yi ). C OROLLARY 6.5.3 ([1], Corollary 1, p. 123). Let y1 , . . . , yn

xyi, xy ci for i 1, . . . , n is consistent; that is D

P X and suppose the system of linear equalities

tx P X : xyi, xy ci for i 1, . . . , nu ∅.

Then

min }x } max c1 a

x D

P

}Ya}¤1

and the optimal x is aligned with the optimal Ya. Proof: See the discussion above.

q

Example 6.5.4. You are taking the new Gen Ed course “Applied Cubism,” whose goal is to teach you how to apply art to everyday life. As your term project, you are producing three large concrete cubes. Initially these will serve as a podium for children who are receiving medals for their athletic prowess, but they will live on as modern sculpture. The gold-medal cube has side u and provides artistic renown 9u. The silver-medal cube has side v and provides artistic renown 4v. The gold-medal cube has side w and provides artistic renown w. You must achieve artistic renown of 72 to earn an A in the course. How do you accomplish this while minimizing the total volume d3 of concrete? Solution: The solution x pu, v, wq lies in the space X l3 . We want to minimize }x }3 such that xy, xy c, where y p9, 4, 1q P X l3{2 and c 72. We have

}ay}3{2 a

3

92

3

42

3

2

12

3

a 64{3 ¤ 1.

Then by Corollary 6.5.3, d

min

xy, x yc

}x} }Ya max c1 a }¤1

max 72a 2 62{3

a 64{3

¤

98 U


and d3 288. Then because x and y must be aligned (there is only one constraint, so Ya in Corollary 6.5.3 reduces to a constant times y), we have x p6, 4, 2q. We can check this result by setting up a Lagrangian: Lpu, v, w, λq u3 v 3 w3 λp9u dL 3u2 9λ 0 du dL 3v2 4λ 0 dv dL 3w2 λ 0 dw

4v

w 72q

We see that pu, v, w, λq p6, 4, 2, 12q satisfies the equations above.

q

We will use the following scenario for the next two examples: You are setting up a company that will operate for four weeks to produce snacks for the inaugural festivities. At the start of each week, and at the end of the fourth week, you will hire or fire staff (1 unit of labor equals 100 hours per week). Each unit of labor produces a ton of snacks per week. Since you are using overworked government personnel to do your hiring and firing, your requirement is to minimize the maximum absolute value of the weekly change in the labor force. In accordance with standard practice, you look for the optimal solution in the space X l8 . It has the form x tη1 , η2 , η3 , η4 , η5 u, and it will be the vector of minimum norm in a linear variety determined by whatever constraints the candidates provide. These constraints will be vectors in X l1 . Example 6.5.5. First you go to the McCain-Palin staff. They require 14 tons of moose jerky. Since they favor small government, they want only 1 unit of labor left over to serve the snacks. °

Solution: Note that at the start of Week k, we have ki1 ξi units of labor, for i 1, . . . , 5 (the start of Week 5 is the inauguration). Then the functional that gives the tons of snacks produced is y1 p4, 3, 2, 1, 0q and the functional that gives the number of units of labor left for the inauguration is y2 p1, 1, 1, 1, 1q. The optimal solution x must satisfy xy1 , x y c1 14 and xy2 , x y c2 1. Since the optimal x is aligned with a vector in l1 , its entries must almost all be d. The exception is that if the vector in l1 has an entry of zero in any position, then x can have any entry between d and d in that position without destroying alignment. The vector Ya can be written p4a1 a2 , 3a1 a2 , 2a1 a2 , a1 a2 , a2 q. Since we want a unit of labor left over, the first three entries of x will be nonnegative, while the last two will be negative. Therefore, in the aligned Ya, the first three entries will also be nonnegative, while the last two will be nonpositive. Therefore, we can discard the absolute values, so }Ya}1 8a1 a2 . From Corollary 6.5.3, d min }x} }Ya max c1 a 14a1 x PD }¤1

a2

Now η3 2a1 a2 0 and we require }Ya} 8a1 a2 1, so a1 1{6, a2 1{3, Ya p1{3, 1{6, 0, 1{6, 1{3q, and d 2. Then since x must be aligned with Ya, x p2, 2, 1, 2, 2q. q Example 6.5.6. Next you go to the Obama-Biden staff. They require 17 tons of brioche stuffed with brie. Since they favor job creation, they want 3 units of labor left over to serve the snacks.

6.5. Applications U

99

Solution: Again we have y1 p4, 3, 2, 1, 0q P l1 and y2 p1, 1, 1, 1, 1q P l1 , though the optimal solution x must now satisfy xy1 , x y c1 17 and xy2 , x y c2 3. We repeat all of the steps as before, though now since we want 3 units of labor left over, either the first three entries of x will be nonnegative while the last two will be negative, or the first four entries of x will be nonnegative while the last one will be negative. Therefore, in the aligned Ya, either the first three entries will be nonnegative while the last two will be nonpositive, or the first four entries will be nonnegative while the last one will be nonpositive. In either case, we can discard the absolute values, so }Ya}1 8a1 a2 or }Ya}1 10a1 3a2. From Corollary 6.5.3, d min }x} }Ya max c1 a 17a1 x PD }¤1

3a2

Now either η3 2a1 a2 0 or η4 a1 a2 0, which combined with the constraint }Ya}1 1 above, yields either a1 1{6, a2 1{3, Ya p1{3, 1{6, 0, 1{6, 1{3q, and d 11{6 or a1 1{7, a2 1{7, Ya p3{7, 2{7, 1{7, 0, 1{7q, and d 2. d is larger in the second case, so that case is optimal. Then since x must be aligned with Ya, x p2, 2, 2, 1, 2q. q Example 6.5.7 ([1], Example 2, p. 124). Consider selecting the field current uptq on r0, 1s to drive a motor governed by θ:ptq θ9ptq uptq with initial conditions θp0q 0, θ9p0q 0, θp1q 1, and θ9p1q 0 such that we minimize maxtPr0,1s |uptq|. Proof: We could think of this problem as being formulated in C r0, 1s, but since this is not the dual of any normed space, we are not guaranteed that a solution exists in C r0, 1s. Therefore, we instead take X L1 r0, 1s, X L8 r0, 1s, and seek u P X of minimum norm. As in Example ??, the constraints are

xy1, uy xy2, uy

»1 0 »1 0

y1 ptquptq dt θ9p1q 0 y2 ptquptq dt θp1q 1,

where y1 ptq et1 and y2 ptq 1 et1 . From Corollary 6.5.3, min }u}8

}max a2 , Ya¤1

so we want to maximize a2 subject to the constraint »1 0

a1

p a2qet1

a2 dt ¤ 1.

We require that uptq is aligned with this function: »1 0

uptqpa1 y1 ptq

a2 y2 ptqq dt }u}8 }a1 y1

a2 y2 } d.

This can happen only if uptq is equal to d everywhere and always has the same sign as a1 y1 ptq a2 y2 ptq. Furthermore, the function a1 y1 ptq a2 y2 ptq is the sum of a constant and an exponential term, so it changes sign on r0, 1s at most once; the same must be true of uptq.

100 U


From the first constraint, we have » t

xy1, uy

»1

e dt t 1

0

ùñ t 1

et1 dt et 1 e1 1 et 1 0

t

log

1 2

1 e

1

,

and from the second constraint, we have

xy2, uy

» t

dp1 e q dt

»1

t 1

0

d

t e t 1

ùñ d

log

dp1 et1 q dt

t

e1

1 e0

t et 1 1

1

p1 eq2 . 4e

Then uptq d for t P r0, t s and uptq d for t P pt , 1s.

q

Example 6.5.8 ([1], Example 3, p. 125). Consider the problem of selecting the thrust program uptq for a vertically ascending rocket-propelled vehicle, subject only to the forces of gravity and rocket thrust, in order to reach a given altitude with minimum fuel expenditure. Solution: Assuming fixed unit mass, unit gravity, amd zero initial conditions, the altitude xptq is governed by the differential equation

:ptq uptq 1, x Integrate to give x9 pT q (Note this gives x9 p0q 0.) Now integrate again: x pT q

»T 0 »T 0 »T 0

x9 psq ds » s 0

xp0q x9 p0q 0.

»T 0

uptq dt T.

» T » s 0

uptq dt

uptq dt s

0

ds

T2 2

ds

» T » T 0

t

uptq ds

dt

2

pT tquptq dt T2 .

(Note this gives xp0q 0.) Then for xpT q 1 we have the constraint

xy, uy

»T 0

y ptquptq dt 1

³T

T2 , 2

where y ptq T t, and we want to minimize 0 |uptq| dt. We could think of this problem as being formulated in L1 r0, T s, but since this is not the dual of any normed space, we are not guaranteed that a solution exists in L1 r0, T s. Therefore, we instead take X

6.6. Hyperplanes and Linear Functionals U C0 r0, T s, X

101

NBVr0, T s, and seek v P X minimizing »T 0

subject to

»T 0

|dvptq| TVpvq }v}

pT tq dvptq 1

T2 . 2

Because we cannot expend negative fuel, the physical meaning of v ptq is the total fuel consumption over the interval r0, ts, while }v } is the total fuel consumption over the whole interval r0, T s. From Corollary 6.5.3,

T2 . d min }v } max a 1 2 }pT tqa}¤1 Then

}pT tqa} tmax |pT tqa| T |a|, Pr0,T s

so the optimal choice is a 1{T and d 1{T T {2. The optimal v must be aligned with pT tqa and, from Example 6.3.4, can vary only at t 0. Thus, v is a step function. Differentiating d with respect to T ? ? ? gives T 2 and d 2, so v 2 1rtPp0,?2ss . q

6.6

Hyperplanes and Linear Functionals

Definition 6.6.1 (hyperplane). A hyperplane H in a linear vector space X is a maximal proper linear variety; that is, a linear variety H such that H X (proper), and if V is any linear variety containing H, then either V X or V H (maximal). The following proposition, stated without proof, gives us a more practical way to work with hyperplanes. Proposition 6.6.2 ([1], Proposition 1, p. 129). Let H be a hyperplane in a linear vector space X. Then there is a linear functional f on X and a constant c such that H tx : f pxq cu. Conversely, if f is a nonzero linear functional on X, then the set tx : f pxq cu is a hyperplane in X.

Definition 6.6.3 (half-space). The four half-spaces associated with a hyperplane H are

tx : f pxq cu tx : f pxq ¡ cu

tx : f pxq ¤ cu tx : f pxq ¥ cu

Definition 6.6.4 (supporting hyperplane). A closed hyperplane H in a normed space X is said to be a support or supporting hyperplane for the convex set K if K is contained in one of the closed half-spaces ¯ determined by H and H contains a point in K

102 U


T HEOREM 6.6.5 (Support Theorem ([1], Theorem 2, p. 133)). Given x in the boundary of a convex set K, there exists a supporting hyperplane containing x. ¯ C such that xk Ñ x. Now for Proof: If x is in the boundary of K, then there exists a sequence txk u in K ¯ there exists a hyperplane Hk strictly separating xk from K. Let Hk be defined by the any k, since xk R K, functional fk ; that is, fk pxk q ¡ fk pyq for all y P K. Now we extract a subsequence so that fk Ñ f ; then in the limit, f pxq ¡ f pyq for all y P K. f defines the desired separating hyperplane. q T HEOREM 6.6.6 (Eidelheit Separation Theorem ([1], Theorem 3, p. 133)). Let K1 and K2 be convex sets in X ˚1 ∅ and K2 X K ˚1 ∅. Then there is a closed hyperplane H separating K1 and K2 , or equivalently, such that K there exists an x P X such that supxPK1 xx, x y ¤ inf xPK2 xx, x y. In other words, K1 and K2 lie in opposite half-spaces determined by H. Proof: Let K K1 K2 . Then K contains an interior point and 0 is not one of them. Recall from Proposition 2.2.3 that K1 K2 is convex. Then by Theorem 6.6.5, there exists an x P X with x 0 such that xx, x y ¤ 0 for x P K. Setting x x1 x2 with x1 P K1 and x2 P K2 gives xx1 , x y ¤ xx2 , x y, so there exists a real number c such that sup xx, x y ¤ c ¤ inf

P

x K1

The desired hyperplane is then H

tx : xx, xy cu.

P

x K2

xx, xy q

C HAPTER 7 CALCULUS OF VARIATIONS

Reading: [1, Sections 7.1 – 7.5, 7.7]

7.1

Review

Example 7.1.1. Let f pu, v q with f p0q 0.

u2 v u2 v 2

• Calculate B f {B u and B f {B v at 0. • Calculate the directional derivative at 0 along the vector h p1, mq. • Introducing the Euclidean norm, show that the derivative does not exist.

Solution: We have

Bf δf p0; p1, 0qq lim 1 pf pα, 0q f p0, 0qq 0 αÑ 0 α Bu Bf δf p0; p0, 1qq lim 1 pf p0, αq f p0, 0qq 0 αÑ 0 α Bv Furthermore, δf p0; hq lim α

1

Ñ0 α

pf pα, αmq f p0, 0qq 3

1 α m αlim Ñ0 α α 2 p 1 m 2 q 1

m . m2

Note that the directional derivative is not a linear functional of h. In fact, the only linear function δf p0; hq that could possibly work in this case, given that both partial derivatives evaluated at 0 equal 0, is δf p0; hq 0. We now show that the derivative does not exist; that is, we show that there is no linear function δf p0; hq of h such that 1 lim pf phq f p0q δf p0; hqq 0. }h}Ñ0 }h} First we convert to polar coordinates: h

ph1, h2q pr cos θ, r sin θq. 103

Then f phq

r cos2 θ sin θ and

104 U

7. C ALCULUS OF VARIATIONS

}h} r. We rewrite the above equation as 1 r cos2 θ sin θ par cos θ r Ñ0 r lim

br sin θq

cos2 θ sin θ a cos θ b sin θ.

for constants a, b. There exist no constants a, b such that the above quantity is zero for all θ, so the derivative does not exist. Intuitively, we can see from Figure 7.1.1 that f has a cusp at 0. q

Figure 7.1.1: On the left, we see that the function f in Example 7.1.1 has a cusp at 0. Furthermore, the partial derivatives B f {B u (middle) and B f {B v (right) are clearly discontinuous at 0, so Proposition 7.1.2 does not apply. The following proposition will be useful later. Note that it did not apply above since the partial derivatives were not continuous at 0, as shown in Figure 7.1.1.

Proposition 7.1.2 ([1, Example 7.2.4, p. 172]). Let X partial derivatives with respect to each xi . Then

E n and let f pxq be a functional on E n having continuous

δf px; hq

7.2

Bf h . B xi i i1 n ¸

´ Gateaux and Frechet Differentials

The Gateaux differential generalizes the concept of a directional derivative: Definition 7.2.1 (Gateaux differential). Let X be a vector space, Y be a normed space, and T : D Ñ Y a transformation, with D X. Let x be a fixed element of D and h an arbitrary element of X. If the limit δT px; hq lim

1 pT p x αÑ 0 α

αhq T pxqq

exists, it is called the Gateaux differential of T at x with increment h. If the limit exists for all h P X, T is said to be Gateaux differentiable at x. Note that while we do require Y to be normed since we need to take limits in Y , we do not require X to

7.2. Gateaux and Fréchet Differentials U

105

be normed. Furthermore, note that if T is Gateaux differentiable at x, then δT px; hq is linear in h since δT px; λhq lim

1 p T px αÑ 0 α

αλhq T pxqq λ lim

1 pT p x γ Ñ0 γ

γhq T pxqq λδT px; hq.

The Fréchet differential generalizes the concept of a derivative: Definition 7.2.2 (Fréchet differential). Let X and Y be normed spaces, and T : D Ñ Y a transformation, with D X. If for fixed x P D and arbitrary h P X there exists a linear, continuous function δT px; hq of h such that 1 lim pT px hq T pxq δT px; hqq 0, }h}Ñ0 }h}

then T is said to be Fréchet differentiable at x and δT px; hq is called the Fréchet differential of T at x with increment h. In Example 7.1.1, we showed that the given function was not Fréchet differentiable at the origin. Example 7.2.3. Calculate the Gateaux differential of f px q from the definition. Then let T

»T 0

xptq2 dt

π{2, xptq sin t, and hptq cos t. Use the differential to approximate »

π 2

0

psin t

0.01 cos tq2 dt.

Solution: We have »

1 T αhq f pxqq lim p xptq αhptqq2 xptq dt αÑ 0 α 0 » »T 1 T αlim p 2αxptqhptq α2 h2 ptqq dt 2xptqhptq dt. Ñ0 α 0 0

1 lim pf px αÑ 0 α

Furthermore, »

π 2

0

psin t

0.01 cos tq dt

»

π 2

2

» »

» 2

sin t dt

0 π 2

» 2

sin t dt

0 π 2

0

π4

π 2

0.02 sin t cos t dt

0 π 2

0.01 sin 2t dt

0 »π

sin2 t dt

0.005 sin s ds 0

0.005 0.7954. q

106 U


Example 7.2.4 ([1, Examples 7.2.2, 7.2.5, p. 172 – 174]). Let X of » 1

f p xq

0

C r0, 1s. Calculate the Gateaux differential

g pxptq, tq dt,

assuming that the partial derivative gx px, tq exists and is continuous with respect to x and t. Show that this differential is also a Fréchet differential. Solution: We write the Gateaux differential as δf px; hq

d dα

»1 0

g pxptq

αh t , t dt

pq q

.

α 0

By our assumptions on g, we can interchange the order of integration and differentiation: δf px; hq

»1 0

gx pxptq, tqhptq dt.

Now we show that the Gateaux differential is also a Fréchet differential. We have

|f px

hq f pxq δf px; hq|

For fixed t, we have

g px pt q

» 1 g xt 0

p p pq

hptq, tq g pxptq, tq gx pxptq, tqhptqq

dt .

hptq, tq g pxptq, tq gx px ¯ptq, tqhptq,

where |xptq x ¯ptq| ¤ |hptq|. Because r0, 1s is compact, gx px, tq is not just continuous but also uniformly continuous with respect to x and t. Therefore, for every ¡ 0, there exists a δ ¡ 0 such that for }h} δ, |gxpx h, tq gxpx, tq| . Therefore,

|f px Therefore,

hq f pxq δf px; hq| 1 |f px }h}Ñ0 }h} lim

» 1 gx x ¯ t ,t 0

gx x t , t h t dt

p p p q q p p q qq p q

¤ }h}.

hq f pxq δf px; hq| 0,

so δf px; hq is indeed a Fréchet differential.

q

Definition 7.2.5 (relative minimum and maximum). Let f be a real-valued functional defined on a subset Ω of a normed space X. A point x0 P Ω is a relative or local minimum of f on Ω if there is an open ball B px0 q such that f px0 q ¤ f pxq for all x P Ω X B px0 q. The relative or local maximum is defined similarly.

Q UIZ T HEOREM 13 ([1, Theorem 7.4.1, p. 178]). Let a real-valued functional f have a Gateaux differential on a vector space X. A necessary condition for f to have an extremum at x0 P X is that δf px0 , hq 0 for all h P X. Proof: For every h P X, the function g pαq f px0

αhq must achieve a extremum at α

0. By Fermat’s

7.3. Euler-Lagrange Equations U

107

theorem from single-variable calculus, if α 0 is an extremum, then g 1 p0q 0; that is, g 1 p0q lim α

1

Ñ0 α

1 pgpαq gp0qq αlim pf px 0 Ñ0 α

αhq f px0 qq δf px0 , hq 0.

q

Note that the theorem is necessary, not sufficient. For example, the function f pu, v q uv has partial derivatives B f {B u v and B f {B v u, so the Gateaux differential of f at 0 with increment h ph1 , h2 q is 0h1 0h2 0. However, 0 is not an extremum, but a saddle point.

7.3

Euler-Lagrange Equations

Definition 7.3.1 (Dra, bs). The normed linear space Dra, bs consists of all continuous and continuously differentiable functions on ra, bs, with norm max |x9 ptq|.

}x} tmax |xptq| Pra,bs

Pr s

t a,b

Q UIZ T HEOREM 14 (Euler-Lagrange ([1, Section 7.5, p. 179 – 180])). Consider finding a function x P Drt1 , t2 s minimizing a functional of the form J

» t2 t1

f pxptq, x9 ptq, tq dt,

where f is continuous with respect to x, x, 9 and t and has continuous partial derivatives with respect to x and x.9 Assume that xpt1 q c1 and xpt2 q c2 for constants c1 , c2 . Show that x0 arg minx J is a solution to the Euler-Lagrange equation d fx pxptq, x9 ptq, tq fx9 pxptq, x9 ptq, tq 0. dt Proof: Our admissible set is the subset of Drt1 , t2 s with xpt1 q and xpt2 q fixed. Given an admissible vector x, we also consider vectors of the form x h that are admissible. The set of all h such that x h is admissible is called the class of admissible variations. Clearly, hpt1 q hpt2 q 0 for such h. The Gateaux differential of J is δJ px; hq

d dα

» t2 t1

f px

αh, x9 αh9 , tq dt

.

α 0

By our assumptions on f , we can interchange the order of integration and differentiation: δJ px; hq

» t2 t1

» t2

9 tqhptq dt fx px, x,

t1

9 tqh9 ptq dt. fx9 px, x,

If we assume that fx9 has a continuous derivative with respect to t, then we can rewrite the second integral using integration by parts: » t2

9 tqh9 ptq dt fxpx, x,9 tqhptq|tt fx px, x, 9

t1

9

2 1

» t2 t1

d 9 tqhptq dt, fx9 px, x, dt

108 U so


δJ px; hq

» t2

9 tq fx px, x,

t1

d 9 tq hptq dt fx9 px, x, dt

9 tqhptq|tt21 . fx9 px, x,

For all admissible variations h, hpt1 q hpt2 q 0, so the second term equals zero. By Quiz Theorem 13, a necessary condition on x0 arg minx J is that it must satisfy » t2

δJ px; hq

fx px, x, 9 tq

t1

d 9 tq hptq dt 0 fx9 px, x, dt

for all h P Drt1 , t2 s with hpt1 q hpt2 q 0. Now we want to show that x0 must satisfy the Euler-Lagrange equation fx px, x, 9 tq

d 9 tq 0. fx9 px, x, dt ³t

We prove that if g ptq is continuous on rt1 , t2 s and t12 g ptqhptq dt 0 for all h P Drt1 , t2 s with hpt1 q hpt2 q 0, then g ptq 0 on rt1 , t2 s. Assume g ptq is nonzero, say positive, for some t P rt1 , t2 s. Then since g is continuous, it is positive on some interval rs1 , s2 s rt1 , t2 s. Let hptq But then

#

p t s 1 q2 ps 2 t q2

s1

0

otherwise » t2 t1

¤ t ¤ s2 .

aptqhptq dt ¡ 0,

contradicting our claim that this integral evaluates to zero. Therefore, g ptq 0 on rt1 , t2 s. Noting that

9 tq fx px, x,

d 9 tq fx9 px, x, dt

is continuous, we apply this result to obtain the Euler-Lagrange equation.

q

In fact, we do not need to assume that the derivative of fx9 with respect to t is continuous. See [1, Lemmas 7.5.1 – 7.5.3] for an alternative derivation. Example 7.3.2. For a mass on a spring, if we set all coefficients equal to 1, the difference between kinetic 9 tq 21 px9 2 x2q. Find the function x that minimizes energy and potential energy (the Lagrangian) is f px, x, J

»

π 2

0

9 tq dt f px, x,

»

π 2

0

1 px9 ptq2 xptq2q dt 2

with xp0q 1 and xpπ {2q 0. Solution: x must satisfy the Euler-Lagrange equation fx px, x, 9 tq

d fx9 px, x, 9 tq x dt

The general solution to this equation is xptq A cos t

x : 0.

B sin t. From the initial conditions, we have A 1

7.3. Euler-Lagrange Equations U and B

109

0, so xptq cos t.

q

Example 7.3.3. The balance in your campaign chest is xptq, and it grows at a rate equal to xptq by accumulating interest. If you want a different growth rate x9 ptq, you must make contributions at a rate uptq x9 ptq xptq. Find the function x that minimizes »1

J

puptqq 0

2

dt

»1 0

9 tq dt f px, x,

»1 0

px9 ptq xptqq2 dt

with xp0q 1 and xp1q e2 . Solution: x must satisfy the Euler-Lagrange equation

9 tq fx px, x,

d 9 tq 2px9 xq 2p:x x9 q 2px x:q 0. fx9 px, x, dt

The general solution to this equation is xptq 1 e2 {p1 eq and B e2 {p1 eq, so xptq

e2

1

Bet . From the initial conditions, we have A

Aet 1

t

e

e

e2 1

e

et .

We make contributions at a rate uptq x9 ptq xptq pAet Bet q pAet

Bet q 2Bet

12e e et. 2

q

Example 7.3.4 ([1, Example 7.5.2, p. 182]). You want to determine your lifetime plan of investment and expenditure that maximizes total enjoyment. Assume you have fixed savings S and know you will die in T years. Further assume you have no descendants and are not a charitable person, so you plan to have no savings left at time T . Your total capital xptq at time t grows at rate x9 ptq αxptq rptq, where α is the interest rate you earn on your investments and rptq is your rate of expenditure. Your enjoyment is a function U of your rate of expenditure rptq, so you want to maximize J

»T 0

eβt U prptqq dt

»T 0

9 tq dt f px, x,

»T 0

eβt U pαxptq x9 ptqq dt.

Find the function x that maximizes J with xp0q S and xpT q 0. Solution: x must satisfy the Euler-Lagrange equation fx px, x, 9 tq

d d βt 1 fx9 px, x, 9 tq αeβt U 1 pαxptq x9 ptqq e U pαxptq x9 ptqq 0 dt dt d 1 ùñ dt U pαxptq x9 ptqq pβ αqU 1pαxptq x9 ptqq.

Integrating this equation gives U 1 prptqq U 1 prp0qq expppβ αqtq.

110 U


?

Consider U prq 2 r. Then U 1 prq r1{2 and pU 1 q1 prq r2 . From the previous equation, we have rptq rp0q expp2pα β qq (note pU 1 q1 prsq pU 1 q1 prqpU 1 q1 psq). We can now solve for xptq: the calculations are messy and not worth repeating here. q

7.4

Problems with Constraints

In several optimization problems, we require the optimal vector to satisfy some constraints. In the previous section, we considered functions that vanish on the endpoints of an interval. More generally, we optimize a functional f subject to n nonlinear constraints given in the implicit form g1 pxq 0, . . . , gn pxq 0. We will assume throughout this section that f and g1 , . . . , gn are continuous and Fréchet differentiable on the normed space X. Definition 7.4.1 (regular point). A point x0 satisfying the constraints g1 pxq 0, . . . , gn pxq 0 is a regular point of these constraints if the n linear functionals g11 px0 q, . . . , gn1 px0 q are linearly independent. Recall that even though g1 , . . . , gn are nonlinear functionals, their Gateaux differentials are linear in the increment h. T HEOREM 7.4.2 ([1, Theorem 7.7.1, p. 187]). If x0 is an extremum of the functional f subject to the constraints gi pxq 0 for i 1, . . . , n and if x0 is a regular point of these constraints, then δf px0 ; hq 0 for all h satisfying δgi px0 ; hq 0 for i 1, . . . , n. Proof (single constraint): Choose h P X such that δg px0 ; hq 0. Choose any vector y P X such that δg px0 ; y q 0 (note that if x0 were not a regular point, we would not be able to do this). Now set g px0 h φy q 0. This is a nonlinear function of and φ since x0 , H, and y are fixed. The derivative of this function with respect to φ at , φ 0 is just δg px0 ; y q, which by assumption is nonzero. Therefore, by the implicit function theorem, we can solve for φ as a function of in some neighborhood of 0. Then 0 g px 0

h

φy q lo gopmo x0oqn looomooon δg px0 ; hq φδg px0 ; y q 0

opq

op}φy }q.

(7.4.1)

0

Since δg px0 ; y q 0, there exist constants constants c1 , c2 ¡ 0 such that c1 |φ| ¤ |δg px0 ; y qφ| ¤ c2 |φ|, and since y 0, there exist constants d1 , d2 ¡ 0 such that d1 }φy } ¤ |φ| ¤ d2 }φy }. Therefore, c1 d1 }φy } ¤ |δgpx0; yqφ| ¤ c2d2}φy}, so from (7.4.1), }φy} opq. Now x0 h φy satisfies the constraint and is a function of . By Quiz Theorem 13, we must have d f p x0 d which implies δf px0 ; hq 0 since }φy } opq.

h

φpqy q|0

0, q

7.4. Problems with Constraints U

111

Proposition 7.4.3 ([1, Lemma 7.7.1, p. 188]). Let f0 , f1 , . . . , fn be linear functionals on a vector space X and suppose that f0 pxq 0 for every x P X satisfying fi pxq 0 for i 1, . . . , n. Then there exist scalars λ1 , . . . , λn such that n ¸

f0

λi fi

0.

i 1

Proof: In E n , consider the subspace M formed by all points of the form pf1 pxq, . . . , fn pxqq. Define the linear functional g on rM s by g pmq f0 pxq. By the Hahn-Banach theorem, we can extend g to λ defined on all of E n . Furthermore, since pE n q E n , we can represent λ as pλ1 , . . . , λn . Thus, we have found λ1 , . . . , λn such that n ¸

λi f1 pxq f0 pxq;

i 1

adding f0 pxq to both sides gives the desired result.

q

T HEOREM 7.4.4 (Lagrange Multipliers ([1, Theorem 7.7.2, p. 188 – 189])). If x0 is an extremum of the functional f subject to the constraints gi pxq 0 for i 1, . . . , n and x0 is a regular point of these constraints, then there exist scalars λi for i 1, . . . , n such that the functional Lpxq f pxq

n ¸

λ i g i px q

i 1

is stationary at x0 ; that is, δLpx0 ; hq δf px0 ; hq

n ¸

λi δg px0 ; hq 0

i 1

for all h. Proof: By Theorem 7.4.2, δf px0 ; hq 7.4.3 gives the desired result.

0 whenever δgipx0; hq 0 for i 1, . . . , n.

Applying Proposition q

Example 7.4.5 (Dido’s Problem ([1, Example 7.7.1, p. 189])). Find the function x that maximizes »1

1 subject to the arc length constraint

»1 a

1

with xp1q xp1q 0.

xptq dt

x9 2

1 dt L

Solution: We want to find a stationary point of »

J

1 f px, x,9 tq dt 1

»1

1

x

a

λ x9 2

1 dt.

112 U


x must satisfy the Euler-Lagrange equation fx px, x, 9 tq

d 9 tq 1 λ dtd ? 2x9 fx9 px, x, dt x9 1

1 λ p1 x:x9 2q3{2 κ 0. looooomooooon

Note that κ is the curvature of a plane curve (see [8, Example 4.2.2, p. 90]). the above equation says that κ is the constant 1{λ, so x must be the arc of a circle

px c1q2 pt c2q2 r2. Recall that the curvature of a circle is the inverse of its radius, so r boundary conditions, we have

λ is the radius of the circle. From the

p0 c1q2 p1 c2q2 r2 p0 c1q2 p1 c2q2 r2,

?

so c2 0 and c1 r2 1. Then the arc length of the circle from t 1 to t 1 is 2rφ, where φ tan1 p1{c1 q is the angle between the line segments pp0, c1 q, p0, 0qq and pp0, c1 q, p1, 0qq. q We can also use the calculus of variations to find aligned vectors, as long as absolute values are not an issue. Example 7.4.6. A linear functional on L4 r0, 1s is represented by a function y ptq that is non-negative on r0, 1s. Find the function xptq that maximizes xx, yy subject to the constraint }x}4{3 1.

{

Solution: Note that the constraint is equivalent to }x}4{3 point of

4 3

J

»1 0

9 tq dt f px, x,

»1 0

1.

Therefore, we want to find a stationary

λpxptqq4{3 dt.

xptqy ptq

x must satisfy the Euler-Lagrange equation fx px, x, 9 tq

4 1{3 λx 3

d fx9 px, x, 9 tq y dt

{

so xptq K py ptqq3 , where K is a constant chosen so that }x}4{3

4 3

»1 0

4 e 3 3t dt

and xptq

»1 0

1 4

1. For example, if yptq et, then

1 e4t dt 1

4

1 4e4

3{4

0,

e3t .

4e4

q

Example 7.4.7. In many of the problems we solved in Hilbert space, the goal was to minimize the norm of a vector x P L2 r0, 1s subject to the constraints pyi , xq ci for i 1, . . . , n. Using the calculus of variations, show that x is a linear combination of the constraints.

7.4. Problems with Constraints U

113

Solution: We want to find a stationary point of J

»1 0

f px, x, 9 tq dt

»1 0

pxptqq

2

n ¸

λi yi ptqxptq

dt.

i 1

Then x must satisfy the Euler-Lagrange equation

9 tq fx px, x,

d 9 tq 2x fx9 px, x, dt

n ¸

λi yi

0,

i 1

so x is a linear combination of the constraints, as we saw before.

q

In Lp r0, 1s, x is still aligned with the constraints, but “aligned” is no longer equivalent to “proportional”. Example 7.4.8. Find the function x P L4 r0, 1s of minimum norm such that »1 0

t3 xptq dt 1.

Solution: We want to find a stationary point of J

»1 0

9 tq dt f px, x,

»1 0

ppxptqq4

λt3 xptqq dt.

Then x must satisfy the Euler-Lagrange equation fx px, x, 9 tq so xptq 5t4 .

d fx9 px, x, 9 tq 4x3 dt

λt3

0, q

114 U


C HAPTER 8 CONVEX FUNCTIONALS

Reading: [1, Sections 7.8, 7.10 – 7.12]

8.1

Local to Global

Recall from single variable calculus that if a function(al) f satisfies f 2 pxq ¥ 0 on ra, bs, then any point x where f 1 pxq 0 is not merely a local minimum but a global minimum. The following theorem helps us generalize the second derivative test by relating the second derivative to convexity. T HEOREM 8.1.1. If f pxq is twice differentiable on ra, bs, then it is convex on ra, bs if and only if f 2 pxq ¥ 0 on C¿ Proof: First we show the “if” direction: specifically, we show that if f is a convex function, then f 1 ptq is increasing or constant for all x P C. Pick x1 , x2 P C such that x1 x2 . Then f 1 px2 q f 1 px1 q lim

Ñ0

f px 2

h

so we simply need to show that f px2 x1 λ x2x2 h p1 λqx1 x1 h so that x1 f p x1

hq

f px2 q f pp1 λqx1

hq f px 2 q f px 1 h

hq f p x 1 q ¥ f px 1 hq λpx2 hq and x2 λx1 λ px 2

hqq

f pλx1

hq

f px 1 q

,

f px2 q for h ¡ 0 if f is convex. Let p1 λqpx2 hq. Then

p1 λqpx2

hqq ¤ f px1 q

f px 2

hq,

where the inequality follows since f is convex. Next we show the “only if” direction: specifically, we show that if f 2 pxq ¥ 0 for x P C, then f pλx1 p1 λqx2q ¤ λf px1q p1 λqf px2q for any x1, x2 P C. The proof is by contradiction: suppose that there exists some λ0 P p0, 1q such that the previous inequality was false. Then by the mean value theorem, there exists some z1 P px1 , λx1 p1 λqx2 q such that

p1 λqx2q f px1q ¡ p1 λqf px1q p1 λqf px2q f px2q f px1q , p1 λqpx1 x2q p1 λqpx1 x2q x2 x1 where the inequality follows from our assumption. Similarly, there exists some z2 P pλx1 p1 λqx2 , x2 q such that f px2 q f pλx1 p1 λqx2 q λf px1 q λf px2 q f px 2 q f px 1 q f 1 pz 2 q , λpx1 x2 q λ px 1 x 2 q x2 x1 but this is a contradiction since we chose z2 ¡ z1 and f 1 is increasing. Therefore, f must be convex. q f 1 pz 1 q

f pλx1

115

116 U

8. C ONVEX F UNCTIONALS

Definition 8.1.2 (convex functional). A real-valued functional f defined on a convex subset C of a linear vector space is convex if f pλx1 p1 λqx2 q ¤ λf px1 q p1 λqf px2 q for all x1 , x2

P C and all λ P p0, 1q.

Example 8.1.3. Check that on L2 r0, 1s, the functional f pxq

»1 0

x2 ptq dt

is convex. Solution: We have f pλx1

»1

p1 λqx2q pλx1ptq p1 λqx2ptqq2 dt 0

λ

»1

2

x21 0

ptq dt

p 1 λq

»1 2

x22 0

ptq dt

λ p1 λ q

»1

p pq

p qq

x1 t x2 t 2 dt . 0 loooooooooooooooooomoooooooooooooooooon

q

¥0

Q UIZ T HEOREM 15 ([1, Proposition 7.8.1, p. 191]). Let f be a convex functional defined on a convex set C in a normed space. Let µ inf xPC f pxq. Then • The subset Ω C where f pxq µ is convex. • If x0 is a local minimum of f , then f px0 q µ and x0 is therefore a global minimum of f .

Proof: First, suppose x1 , x2 P Ω. Then for x λx1 p1 λqx2 with λ P p0, 1q, we have f pxq ¤ λf px1 q p1 λqf px2q µ. Furthermore, for any x P C, f pxq ¥ µ by definition. Therefore, f pxq µ. Next, suppose N is a neighborhood about x0 in which x0 minimizes f . For any x P C with x x0 , there exists some λ P p0, 1q such that x1 λx0 p1 λqx P N . Then f px 0 q

paq

pbq

¤ f px1q ¤ λf px0q p1 λqf pxq,

where (a) follows from the fact that x0 is a local minimum and (b) follows from the convexity of f . Rearranging terms, we have f px0 q ¤ f pxq for all x P C, so x0 is a global minimum. q

Definition 8.1.4 (epigraph). The epigraph of a convex functional f defined on a convex set C in a vector space X is the set rf, C s tpr, xq P R X : x P C, f pxq ¤ ru.

8.2. Conjugate Convex Functionals U

117

Example 8.1.5. Show that the epigraph is a convex set. Solution: Pick px1 , y1 q and px2 , y2 q such that f px1 q ¤ y1 and f px2 q ¤ y2 . Then λy1

p1 λqy2 ¥ λf px1q p1 λqf px2q ¥ f pλx1 p1 λqx2q,

where the last step follows since f is convex. Therefore the point pλx1 p1 λqx2 , λy1 or above the graph of f , so the region on or above the graph of f is convex.

8.2

p1 λqy2q is on q

Conjugate Convex Functionals

Definition 8.2.1 (conjugate convex). Let f be a convex functional defined on a convex set C in a normed space X. The conjugate set C is defined as "

*

C x P X : suppxx, x y f pxqq 8 xPC and the functional f conjugate to f is defined on C as f px q suppxx, x y f pxqq.

P

x C

Example 8.2.2. Let C

r0, 1s and f pxq x2. Find C , f pxq, C , and f .

Figure 8.2.1: The epigraphs rf, C s and rf , C s in Example 8.2.2. The linear functionals mx for m 0, 1, 2 are shown on rf, C s, and the linear functionals xm for x 0, 1{2, 1 are shown on rf , C s. Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.2.1 shows the convex sets rf, C s and rf , C s. From the graph and using elementary calculus, we

118 U


see that f px q f pmq

$ ' ' &0

m¤0

m2 ' 4

' %m

1

0 m 2.

m¥2

Furthermore, f pmq is the vertical coordinate of the point where the hyperplane (line) tangent to the graph of f and parallel to the subspace mx intersects the vertical axis. For m ¤ 0, clearly the tangent line and mx are equal, so 0 f pmq. For m P p0, 2q, the tangent line is given by y 2x0 px x0 q x20 : for m 2x0 , the vertical intercept of this line is m2 {4 f pmq. For m ¥ 2, the tangent line is given by y mpx 1q 1: the vertical intercept of this line is pm 1q f pmq. Is this simple case, X X, and any element of X can be represented by a real number x in the form xx , x y xm. Clearly f pxq is undefined except on r0, 1s, so C r0, 1s. Using elementary calculus, we see that f pxq x2 for x P p0, 1q. q Example 8.2.3. Let C

r1, 2s and f pxq 0. Find C , f pxq, C , and f .

Figure 8.2.2: The epigraphs rf, C s and rf , C s in Example 8.2.3. The linear functionals mx for m 1, 0, 1 are shown on rf, C s, and the linear functionals xm for x 1, 3{2, 2 are shown on rf , C s. Solution: We construct C and f using the “support functional” approach suggested in the previous example: construct the hyperplane, parallel to the subspace mx, that touches f at one or more points in C and lies below the graph elsewhere. Then f pmq is the vertical intercept of the hyperplane. If no such hyperplane exists, then m R C . Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.2.2 shows the convex sets rf, C s and rf , C s. Clearly, f px q f pmq

#

m 2m

m 0 m¥0

and C R. Again, we use the support functional approach to find f and C : from Figure 8.2.2, clearly f f and C C. q

8.2. Conjugate Convex Functionals U 1 p

For the next example, note that if

1 q

1, then

pq p q p p 1q p

Example 8.2.4. Let C

119

p1 pqp1 qq 1 q ppq 1q

q

E n and f pxq α p1 }x}pp for 1 p 8 and α ¡ 0. Find C and f .

Solution: Let x pξ1 , . . . , ξn q P C and represent an element x

P C C as pη1, . . . , ηnq. Then

¸ 1 ξi ηi α|ξi |p . f px q suppxx, bx y f pxqq sup p xPC xPC i1 n

The supremum exists since the sum inside the supremum is finite. Differentiating with respect to each ξi gives ηi α|ξi |p1 sgn ξi for i 1, . . . , n, (8.2.1) so

¸ 1 ξi ηi α|ξi |p f px q p i1 n

¸ α 1q |ξi|p. i1 n

Taking absolute values, exponentiating both sides of (8.2.1) by q, and using the identity p q pp 1q gives |ηi|q αq |ξi|p; substituting this above gives f px q α1q

Q UIZ T HEOREM 16. Let C

1¸ |ηi|q q i1 n

1q α1q }x}q .

Lpp0, 1q for 1 p 8 and f puq

1 p

»1 0

puptqqp dt.

Find C and f .

Proof: We can represent an element y

P C Lq p0, 1q, where p1 xy, uy

Then

»1 0

P

1, as

y ptquptq dt.

f py q suppxy, uy f puqq sup

P

u C

1 q

u C

»1 0

pyptquptq puptqqpq dt;

that is, we want to find a stationary point of J

»1 0

f pu, u, 9 tq dt

»1 0

pyptquptq puptqqpq dt.

q

120 U


Then u must satisfy the Euler-Lagrange equation fu pu, u, 9 tq so y

d fu9 pu, u, 9 tq y pup1 dt

0,

pup1. We check that y P Lq p0, 1q: »1 0

pyptqq

q

dt

»1

p puptqq p q dt q

0

»1

q p 1

0

pq puptqqp dt 8

where boundedness follows since u P Lp p0, 1q.

q

Q UIZ T HEOREM 17 (first part of [1, Proposition 7.10.1, p. 196]). The conjugate set C and the conjugate functional f are convex. Proof: For any x1 , x2

P C and any λ P p0, 1q,

f pλx1

p1 λqx2 q suppxx, λx1 p1 λqx2 y f pxqq xPC suppλpxx, x1 y f pxqq p1 λqpxx, x2 y f pxqqq xPC ¤ λ suppxx, x1 y f pxqq p1 λq suppxx, x2 y f pxqq xPC xPC λf px1 q p1 λqf px2 q.

Therefore, f is convex. Since C is the set of all x where f px q is defined, C is also convex.

q

Proposition 8.2.5 (second part of [1, Proposition 7.10.1, p. 196]). rf , C s is a closed convex subset of R X. Proof: From Quiz Theorem 17, rf , C s is convex. To show that it is closed, let tpsi , xi qu be a convergent sequence in rf , C s with psi , xi q Ñ ps, x q. We must show that ps, x q P rf , C s. For every i and every x P C, si

paq ¥ f px q ¥ xx, xy f pxq,

where (a) follows by definition of the epigraph and (b) follows by definition of f . Taking the limit as i Ñ 8 gives s ¥ xx, x y f pxq ¥ f px q for all x P C, so ps, x q P rf , C s. q

The following proposition confirms our intuition about recovering C and f from C and f , as we did in the previous examples:

Proposition 8.2.6 ([1, Proposition 7.10.2, p. 197]). Let f be a convex functional on a convex set C in a normed space X. If rf, C s is closed, then rf, C s rrf, C s s.

Example 8.2.7. Let C f .

D¯ 2 (the closed unit disk in E 2) and let f pxq f pu, vq 0. Find C , f , C , and

8.3. Conjugate Concave Functionals U

?

Solution: We actually found C R2 and f pa, bq ¯ 2 X R3 is closed. C C and f f since rf, C s D

8.3

a2

121

b2 in Example 1.3.2. By Proposition 8.2.6, q

Conjugate Concave Functionals

Definition 8.3.1 (concave functional). A functional g defined on a convex set D of a linear vector space is concave if g is convex. Note that the set D is still convex, not concave. Furthermore, a functional is linear if and only if it is both convex and concave. Just as we defined the epigraph for convex functionals, we define the hypograph for concave functionals: Definition 8.3.2 (hypograph). The hypograph of a concave functional g defined on a convex set D in a vector space X is the set rg, Ds tpr, xq P R X : x P D, f pxq ¥ ru. The hypograph is a convex set: the proof is analogous to that in Example 8.1.5. We can also define conjugate functionals and sets for concave functionals, just as we did for convex functionals: Definition 8.3.3 (conjugate concave). Let g be a concave functional defined on a convex set D in a normed space X. The conjugate set D is defined as "

*

C x P X : inf pxx, x y f pxqq ¡ 8 xPD and the functional g conjugate to g is defined on D as g px q suppxx, x y g pxqq.

P

x D

Example 8.3.4. Let D

r0, 1s and gpxq x2. Find D and g.

Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.3.1 shows the convex sets rg, Ds and rg , D s. From the graph and using elementary calculus, we see that $ ' m¥0 ' &0 g px q g pmq

and C 8.2.2.

m4

2

' ' %m

1

2 m 0 . m ¤ 2

R. From Figure 8.3.1, clearly g g and D D. Note that gpmq f pmq from Example q

122 U


Figure 8.3.1: The hypographs rg, Ds and rg , D s in Example 8.3.4. The linear functionals mx for m 0, 1, 2 are shown on rg, Ds, and the linear functionals xm for x 0, 1{2, 1 are shown on rg , D s.


r1, 2s and gpxq 0. Find D and g.

Figure 8.3.2: The hypographs rg, Ds and rg , D s in Example 8.3.5. The linear functionals mx for m 1, 0, 1 are shown on rg, Ds, and the linear functionals xm for x 1, 3{2, 2 are shown on rg, Ds.

Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx. Figure 8.3.2 shows the convex sets rg, Ds and rg , D s. From the graph, #

2m m 0 g px q g pmq m

and D 8.2.3.

m¥0

R. From Figure 8.3.2, clearly g g and D D. Note that gpmq f pmq from Example q

8.4. Fenchel Duality U


123

p0, 8sn equipped with the Euclidean norm and g px q α

1¸ p ξ p i1 i n

for x tξ1 , . . . , ξn u, p 1 and α ¡ 0. Find C and f . Solution: The proof is nearly the same as in Example 8.2.4. If we represent an element x tη1, . . . , ηnu, we have n ¸ 1q 1 g px q α ηiq . q i1

8.4

P D D as q

Fenchel Duality

Suppose that f is convex over convex domain C and g is concave over convex domain D. A standard minimization problem is to find the minimum vertical distance µ

inf

P X

x C D

pf pxq gpxqq.

While g pxq usually equals zero, introducing g is helpful because it allows us to exploit duality: the minimum vertical separation between the convex sets rf, C s and rg, Ds equals the maximum vertical separation between two parallel hyperplanes separating rf, C s and rg, Ds, as shown in Figure 8.4.1.

Figure 8.4.1: The minimum vertical separation between the convex sets rf, C s and rg, Ds equals the maximum vertical separation between two parallel hyperplanes separating rf, C s and rg, Ds.

Q UIZ T HEOREM 18 (Fenchel Duality Theorem ([1, Theorem 7.12.1, p. 201])). Suppose that f is convex over C and g is concave over D, where C, D are convex sets in a normed space X. Assume that C X D contains points

124 U


in the relative interiors of C and D and that either rf, C s or rg, Ds has nonempty interior. Further assume that µ Then

inf

P X

x C D

pf pxq gpxqq 8.

µ max pgpxq f pxqq, x PC XD

where the maximum on the right is achieved by some x0 P C X D . If the infimum on the left is achieved by some x0 P C X D, then

maxpxx, x0 y f pxqq xx0 , x0 y f px0 q

P

x C

minpxx, x0 y g pxqq xx0 , x0 y g px0 q.

P

x D

Proof: By definition of f and g , for all x P C X D and x

P C X D ,

f px q ¥ xx, x y f px q

g px q ¤ xx, x y g px q,

so f pxq g pxq ¥ g px q f px q. Since this holds for all x P C X D and x µ

inf

P X

x C D

pf pxq gpxqq ¥

To prove equality, we must find an x0

P C X D ,

sup pg px q f px qq. x PC XD

P C X D such that µ gpx0 q f px0 q.

By definition of µ, the sets rf µ, C s and rg, Ds are arbitrarily close but have disjoint relative interiors. Since one of the sets rf µ, C s and rg, Ds has nonempty relative interior, there is a closed hyperplane in R X separating the sets. The hyperplane cannot be vertical, otherwise its vertical projection onto X would separate C and D. Therefore, we can represent the hyperplane as

tpr, xq P R X : xx, x0 y r cu for some x0

P X and some c P R. Now rg, Ds lies below this hyperplane but is arbitrarily close to it, so c inf pxx, x0 y g pxqq g px0 q

(8.4.1)

P

x D

and similarly, rf

µ, C s lies below the hyperplane but is also arbitrarily close to it, so c suppxx, x0 y pf pxq µqq f px0 q

P

µ,

(8.4.2)

x C

so µ g px0 q f px0 q. If the infimum µ is achieved by some x0 P C X D, then the sets rf µ, C s and rg, Ds have the point pgpx0q, x0q in common and this point is in the separating hyperplane, so by (8.4.1)


125

and (8.4.2), maxpxx, x0 y f pxqq xx0 , x0 y f px0 q

P

x C

minpxx, x0 y g pxqq xx0 , x0 y g px0 q.

q

P

x D

The general strategy is to use the sets C and D to impose constraints. • If we want to minimize a convex functional f , we choose g pxq 0. Then we want to minimize f g, which is equivalent to maximizing g f . Since we chose g pxq 0, µ

inf

P X

x C D

f pxq max pgpxq f pxqq x PC XD

is the desired minimum of f . • If we want to maximize a concave functional g, we choose f pxq 0. Then we want to minimize f g, which is equivalent to maximizing g f . Since we chose f pxq 0, µ sup g pxq min pf pxq gpxqq x P C X D xPC XD is the desired maximum of g.

Example 8.4.1 (generalization [1, Example 7.12.1, p. 202]). Suppose you have w to spend on n activities, each of which costs wi per unit of work for i 1, . . . , n. Assume that the return associated with activity i is gi pxi q, xi is the number of units of work allocated to the activity, and gi is increasing and gi1 is decreasing due to diminishing marginal returns (so g is concave).

Solution: We write the problem formally as max

x1 ,...,xn

s.t.

g px q

n ¸

g i px i q

(returns)

wi x i

w

(budget constraint)

x1 , . . . , x n

¥0

(positivity constraints)

i 1 n ¸

i 1

where x px1 , . . . , xn q. To solve the problem using Fenchel duality, set C

#

x:

n ¸

wi x i

w

+

i 1

to capture the budget constraint, and set D r0, 8qn to capture the positivity constraints. Set f pxq since we are maximizing a concave functional g. Then for x px1 , . . . , xn q, f px q sup

n ¸

P

x Ci 1

xi xi ,

0

126 U


which is finite only if x Furthermore, is defined for all xi

λpw1, . . . , wnq, in which case f pxq λw. Therefore, C tλpw1, . . . , wnqu. gi pxi q

pxixi gipxiqq

P r0, 8q since gipxiq is concave, so g px q inf xPD

is defined for all x

inf

Pr 8q

xi 0,

n ¸

x i x g p xq

i

n ¸

gi pxi q;

i 1

i 1

P D D. Therefore, the dual problem is min

x C D

P X

pf pxq gpxqq min λ

λw

n ¸

gi pλwi q .

i 1

Given the optimal λ, xi is the value for which the infimum in (8.4) is achieved, with xi 1, . . . , n.

λwi, for i q

Example 8.4.2. As a concrete application of Example 8.4.1, say you have w 5 to spend on two presents: x1 on the first and x2 on the second. Maximize the total gratitude you generate, where total gratitude is ? ? given by g pxq g1 px1 q g2 px2 q, with g1 px1 q 2 x1 and g2 px2 q x2 . Solution: By elementary calculus, we have g px q 1

1

g2 px2 q

inf

Pr 8q

x1 0,

2 ? px x 2 x q 1 x 2

? p x2 x2 x2 q x Pr0,8q inf

1

where the infima are achieved at x1

x

1

1 1

1

1 2x2

d

1

2

x2

d

1 x1

1 2x2

2

2

x1 1

4x1 . 2

p1{x1 q2 and x2 p1{2x2 q2. The dual problem is

min λw λ

1 λ

1 4λ

;

taking the derivative with respect to λ, we have x0 Then x1

5 1 4 λ2

0 ùñ λ

1 2

c

5 w

12 .

? ? x2 1{2, so x1 4 and x2 1. The maximum gratitude is gp4, 1q 2 4 1 5.

q

Example 8.4.3. As another concrete application of Example 8.4.1, say you have $w million to invest in oil exploration and wind power. The government gives you a 50% subsidy for your wind power investment, 1 so your budget constraint is x1 2 x2 w. The total return on your investment is given by g pxq

8.4. Fenchel Duality U g 1 px 1 q

127

g2 px2 q, where g1 px1 q

#

0 ¤ x1

4x1 9

x1

x1

¡3

¤3

g 2 px 2 q

#

0 ¤ x2

3x2 4

x2

x2

¥2

¤2.

Figure 8.4.2: The hypographs rg1 , Ds and rg2 , Ds in Example 8.4.3. Solution: From the graphs in 8.4.2, we see that g1 px1 q

#

3x1 12 0

1 ¤ x1

x1

¡4

¤4

g2 px2 q

#

2x2 6 0

1 ¤ x2 x2

¡3

¤3,

where the infima are achieved at x1 3 if 1 ¤ x1 ¤ 4 and x1 0 if x1 ¡ 4, and x2 2 if 1 ¤ x2 x2 0 if x2 ¡ 3. Then D r1, 8q2 , and C X D tλp1, 1{2q : λ ¥ 2u. The dual problem is

¤ 3 and

1 µ min λa g1 pλq g2 λ λ¥2 2

.

The table below shows values of the optimal x1 and x2 for various values of a, which we can determine using the graphs in Figure 8.4.3. a

1{2 2 5

λ

x1

x2

x1

x2

µ

6 4 2

6 4 2

3 2 1

0 1 3

1 2 4

3 10 20 q

Example 8.4.4 ([1, Example 7.12.2, p. 203]). As yet another concrete application of Example 8.4.1, say you have $w to bet on a horse race with n horses. Assume that we know that the public has bet si on horse i

128 U


Figure 8.4.3: The function λa g1 pλq g2

in Example 8.4.3 for a 1{2, 2, 5.

1 2λ

and that horse i has probability pi of winning for i 1, . . . , n. We bet xi on horse i, and we must bet all of our money on the n horses. Let P denote the amount the track pays out to bettors on the winning horse (typically 80% of the total amount bet). Maximize your expected fraction of the total amount P paid out.

Proof: Our expected net return is

n ¸ C w si loooomoooon i1

n ¸

i1

pi

w.

xi xi s i loomoon amount won if horse i wins

total amount bet

Maximizing this quantity is equivalent to maximizing g px q

n ¸

g i px i q ,

gi pxi q

i 1

pi xi . xi s i

Remember that the value of the conjugate functional gi at xi is obtained by finding the lowest line of slope xi which lies above gi . Therefore, gi pxi q is undefined for xi ¤ 0, and since gi1 p0q pi {si , gi pxi q 0 for xi ¥ pi {si . By elementary calculus, we have for 0 λ pi {si

d pi xi xi xi dxi xi s i Then

x i

pi si p x i s i q2

pi xi gi pxi q inf xi xi x si xi Pr0,8q i

0 ùñ xi

#

px p s xq

2 i i 2 i i

0

c

pi si xi

0 xi xi

¥ ps

i i

si .

ps

i i

,

(8.4.3)

(8.4.4)

where xi is determined from (8.4.3). Now we know from Example 8.4.1 that xi λ for i 1, . . . , n. Rearrange the indices so that p1 {s1 ¥ ¥ pn{sn. For the given λ, define m as the largest index for which pi{si ¥ λ. From (8.4.3) and (8.4.4), our solution is $b & pi si s i 1, . . . , m i λ . q xi %0 i m 1, . . . , n


129

Example 8.4.5 ([1, Example 7.12.3, p. 205]). Find u P L2 r0, 1s minimizing f puq

1 2

»1 0

puptqq2 dt

subject to the linear constraints

yu

py 1 , u q

.. . py n , u q

Solution: Let C

c1 .. . c. cn

L2r0, 1s, D tu : yu cu, and gptq 0. To find f puq, we want to maximize J

pu, uq f puq

»1 0

uptqu ptq

1 ppuptqq2 2

dt

»1 0

9 tq dt, v pu, u,

so u must satisfy the Euler-Lagrange equation

9 tq vu pu, u, Therefore, and C

d 9 tq u u 0. vu9 pu, u, dt

1 f pu q supppu, u q f puqq 2 uPC

L2r0, 1s. Furthermore,

»1 0

puptqq2 dt

g pu q inf pu, u q ;

uPD that is, we want to minimize pu, u q subject to the constraints py1 , uq c1 , . . . , pyn , uq cn . Equivalently,

we want to find a stationary point of J

»1 0

n ¸

uptqu ptq

λi yi ptquptq

dt

i 1

»1 0

9 tq dt, v pu, u,

so v must satisfy the Euler-Lagrange equation vu pu, u, 9 tq Therefore, g pu q is defined only if u

λi yi

0.

i 1

°ni1 λiyi yT λ for λT pλ1, . . . , λnq, in which case

¸ g pu q inf u, λ i yi uPD i1

and D

n ¸

d 9 tq u vu9 pu, u, dt

n

uinf PD

n ¸

λi ci

pλ, cq cT λ

i 1

tu : u yλu. By Quiz Theorem 18, the dual maximization problem is then

max pg pu q f pu qq max cλT λ u PC XD

1 T T T p y λ q py λ q , 2

130 U


where by elementary matrix calculus, we achieve the maximum at λ the original minimization problem is u yT λ.

pyyT q1c.

Then the solution to q

B IBLIOGRAPHY

[1] D. G. Luenberger, Optimization by Vector Space Methods. Wiley Interscience, 1969 [1997]. [2] W. Rudin, Principles of Mathematical Analysis. McGraw-Hill Science/Engineering/Math, third ed., 1976. [3] M. J. D. Powell, Approximation Theory and Methods. Cambridge University Press, 1981. [4] M. A. Khamsi, “Gibbs phenomenon,” S.O.S. MATHematics, 2008. [5] S. Axler, Linear Algebra Done Right. Springer, 2nd ed., 2004. [6] J. Munkres, Topology. Prentice Hall, 2nd ed., 2000. [7] G. Bachmann and L. Narici, Functional Analysis. Dover Publications, 2nd ed., 1998. [8] B. V. Brunt, Calculus of Variations. Springer, 1st ed., 2003.

131

Notes on Luenberger's Vector Space Optimization

Recommend Documents