Homework 1 Ana Huaman March 9, 2011 1. In class it was claimed that given a function h :
For property of escalar multiplication: h(u + αv) − h(u) ∂h T =k kkvk cos θ α→0 α ∂u lim
(1)
∂h T and v. ∂u To find the maximum rate of change of h moving in u direction (that is, to find the biggest h(u + αv) − h(u)), we need the right side of the Eq.1 to be the maximum. For cos θ, the maximum value it can take is 1, which happens when θ = 0 where θ is the angle between
θ = 0 implies that the angle between
∂h T and v is 0, or in other words, ∂u
∂h T points towards the same direction as v (they are parallel). As ∂u ∂h T we defined that the rate of growth is in the direction of v, and that ∂u ∂h T is parallel to it, in consequence also points to the maximum growth ∂u direction. that
1
2. Solve the following minimization problem: 1 minu∈
∂L : ∂u L=
1 T u Qu − bT u 2
∂L L(u + v) − L(u) = lim →0 ∂u v Hallamos L(u + v) − L(u) first: L(u + v) − L(u) =
1 1 (u + v)T Q(u + v) − bT (u + v) − uT Qu + bT u 2 2
1 T 1 1 1 1 u Qu+ uT Qv+ v T Qu+ v T Qv−bT u−bT v− uT Qu+bT u 2 2 2 2 2 1 T 1 T 1 T L(u + v) − L(u) = u Qv + v Qu + v Qv − bT v 2 2 2
L(u+v)−L(u) =
We cancel the 3rd term of the right side, because it has a factor = 2 , which goes to zero. Grouping properly: 1 1 L(u + v) − L(u) = ( uT Q + uT QT − bT )v 2 2 Replacing in the original equation: 1 ( uT Q + L(u + v) − L(u) 2 = v
1 T T u Q − bT )v 2 v
∂L L(u + v) − L(u) 1 1 = = ( uT Q + uT QT − bT ) ∂u v 2 2 As stated in the initial conditions: Q = QT so we finally have: ∂L 1 1 = ( uT Q + uT Q − bT ) = (uT Q − bT ) ∂u 2 2
2
(2)
Second, to find
∂h , we analyze h: ∂u h : Au − C = 0
Multiplying to both sides by C T h : C T Au − C T C = 0 So we have:
∂h = CT A ∂u
(3)
Third, using 2 and 3 into the Lagrangian: ∂L ∂h +λ =0 ∂u ∂u uT Q − bT + λC T A = 0 We find the value of u∗ (λ) from above: u∗ = Q−1 (b − λ∗ AT C)
(4)
Now we plug this u∗ into h, to find λ∗ : h : Au = c Replacing 4 into the equation shown: AQ−1 (b − λ∗ AT C) = C Operating, we find λ∗ : λ∗ =
k C T (AAT )−1 AQ(AT A)−1 AT (AQ−1 b − C) k k CT C k
(5)
So, our minimizer u∗ would be: u∗ = Q−1 (b − λ∗ AT C), where λ∗ is the value obtained in Equation 5 ∂2L Note: To make sure that u∗ is a minimizer, we can analize , which ∂u2 is Q, that is positive definite from the premise of the problem. Hence, the value u∗ is safely considered a minimum. 3.
• Let F (M ) be the matrix function F (M ) = M T M M T where M is an n × m matrix. What is the directional derivative of F? Solution The directional derivative is defined by: ∂f (x, y) = lim
→0
3
f (x + y) − f (x)
For the function F in this problem, we find F (M + δN ) F (M + δN ) = (M + δN )T (M + δN )(M + δN )T F (M + δN ) = M T M M T + δM T N M T + δN T M M T + δ 2 N T N M T + δM T M N T + δ 2 M T N N T + δ 2 N T M N T + δ 3 N T N N T We find F (M + δN ) − F (M ). We do also eliminate the factors which have δ n F (M + δN ) − F (M ) = δM T N M T + δN T M M T + δM T M N T F (M + δN ) − F (M ) = MT NMT + NT MMT + MT MNT δ ∂F (M, N ) = M T N M T + N T M M T + M T M N T
(6)
The directional derivative of F is Equation 6 • If f is a continuously differentiable function, f :
∂f (x) y ∂x
Solution The directional derivative is defined by: f (x + hy) − f (x) h→0 h
∂f (x, y) = lim
(7)
As f is C 1 , we find the Taylor expansion (h ← 0): f (x + hy) = f (x) +
∂f (x) hy + Ø((hy)2 ) ∂x
Getting rid of the term O(h2 y 2 ) f (x + hy) − f (x) =
∂f (x) hy ∂x
f (x + hy) − f (x) ∂f (x) = y h ∂x Replacing the left side of 8 with 7: ∂f (x, y) =
(8)
∂f (x) y ∂x
so, the claim of this problem is true, if the initial conditions are met (function continuous and differentiable). 4. Consider the volume maximization problem: Construct a box (closed on all sides) of maximal volume with sides x,y,z, given an upper bound c on the area of the boundary of the box Solution 4
Defining formally our problem for maximizing the volume of a cube with sides x,y,z : max V (x, y, z) = max(xyz) x,y,z
x,y,z
We can re-define the maximum like the opposite of the minimum of the negative of L, that is: max V (x, y, z) = min −L(x, y, z) = min −(xyz) x,y,z
x,y,z
x,y,z
such that the area of the boundary is less than c: g(x, y, z) : xy + xz + yz < c → xy + xz + yz − c < 0 Applying the Kuhn-Tucker conditions to find a possible maximum (- minimum) for this situation: ∂g ∂L +µ =0 ∂u ∂u
(9)
(where µ > 0). We know calculate each factor in the KT condition: ∂L = (−yz, −xz, −xy) ∂u ∂g = (y + z, x + z, x + y) ∂u Applying both 9 in we get the following equations: −yz + 2µy + 2µz = 0 −xz + 2µx + 2µz = 0 −xy + 2µx + 2µy = 0 From these equations we can find x,y,z in function of λ: x = y = z = 4λ Applying this in the inequality condition g: g(x, y, z) : 2xy + 2xz + 2yz ≤ c 96µ2 ≤ c √ c µ≤ √ 4 6
√ c We define µ∗ = √ ,so for this we would have a possible minimum of: 4 6 √ c x∗ = y ∗ = z ∗ = √ 6 Checking the last condition: µg(x∗ , y ∗ , z ∗ ) = 0 5
c c c µ(2 + 2 2 − c) = 0 6 6 6 µ(c − c) = 0 we see that it is true. Hence, this is a probable minimum for L, which is the negative of the volume. Hence the maximum for the original problem is: √ c x∗ = y ∗ = z ∗ = √ 6 5. Given an unconstrained optimization problem minu L(u) the ”normal” FONC for optimality is ∂L =0 ∂u But what if we only have directional derivatives instead of ”normal” derivatives. What is FONC in this case? Solution In some problems, it happens that we do not have ”normal” derivatives. This due to the fact that not all directions of movement are feasible, meaning that if going in that direction, we are going to get out of the area where our problem is defined. An example of this is for the points that are located in the boundaries. Any direction that points outside the area is infeasible, whereas any direction pointing inside is feasible. For problems where we have only directional derivatives, we have to evaluate them like: ∂f f (x + αd) − f (x) (x) = lim α → 0 ∂d α where d is a feasible direction and α > 0 Back to the problem: We plan to calculate the Taylor expansion of: L(u∗ + αd) where we consider that u∗ is a minimum with respect to cost L. To make things simpler, we define the function: u(α) = u∗ + αd Note that u(0) = u∗ . Let define another function: ψ(α) = L(u(α)) Note that ψ(0) = L(u(0)) = L(u∗ ), or the minimum cost we are considering. 6
Applying Taylor expansion to find L(u∗ + αd) = ψ(α): ψ(α) = ψ(0) +
∂ψ(0) α + O(α) ∂α
We can ignore the last element O(α): ψ(α) = ψ(0) + dT ∇L(u(0))α ψ(α) = ψ(0) + dT ∇L(u∗ )α Putting everything back to L and u∗ : L(u∗ + αd) = L(u∗ ) + dT ∇L(u∗ )α
(10)
Now, if L(u∗ ) is the minimum cost, then L(u∗ + αd) must be bigger than L(u∗ ): L(u∗ + αd) ≥ L(u∗ ) Replacing the left side with Eq.10 L(u∗ ) + dT ∇L(u∗ )α > L(u∗ ) dT ∇L(u∗ )α > 0 We are considering that α > 0 (from the definition of directional derivatives). So, we can eliminate it safely: dT ∇L(u∗ )α > 0
(11)
is a condition for L(u∗ ) to be a minimum, considering directional derivatives. So, the FONC for this case is given by Eq. 11, namely: dT ∇L(u∗ ) ≥ 0 6. Let L ∈ C 1 be convex, i.e., L(αu1 + (1 − α)u2 ) ≤ αL(u1 ) + (1 − α)L(u2 ), ∀u1 , u2 ∈
∂L ∗ (u ) = 0 ∂u Show that u∗ is a global minimum to L
Solution Of the definition for a convex function: L(αu1 + (1 − α)u2 ) ≤ αL(u1 ) + (1 − α)L(u2 ), ∀u1 , u2 ∈
We substract L(u2 ) from both sides: L(u2 + α(u1 − u2 )) − L(u2 ) ≤ α(L(u1 ) − L(u2 )) L(u2 + α(u1 − u2 )) − L(u2 ) ≤ (L(u1 ) − L(u2 )) α We multiply and divide the left side of the inequality by (u1 − u2 ) (u1 − u2 )
L(u2 + α(u1 − u2 )) − L(u2 ) ≤ (L(u1 ) − L(u2 )) α(u1 − u2 )
If we consider α → 0, we would have that the second factor in the left side in the form of a derivative: (u1 − u2 )
∂L(u2 ) ≤ L(u1 ) − L(u2 ) ∂u
(12)
Now, from the initial affirmation, we have that a u∗ such that: ∂L ∗ (u ) = 0 ∂u Let u2 = u∗ , then replacing it in Eq. 12 (u1 − u∗ )
∂L(u∗ ) ≤ L(u1 ) − L(u∗ ) ∂u
(u1 − u∗ ) × 0 ≤ L(u1 ) − L(u∗ ) 0 ≤ L(u1 ) − L(u∗ ) L(u∗ ) ≤ L(u1 ) We change u1 by u and we finally have: L(u∗ ) ≤ L(u)
(13)
From the expression obtained in Eq.13, we easily see that L(u) is always greater or equal than L(u∗ ); hence, it is a global minimum over the dominium of L (
8