5 Numerical Methods for Unconstrained Optimization.pdf

Numerical Methods for Unconstrained Optimization

Cheng-Liang Chen

PSE

LABORATORY

Department of Chemical Engineering National TAIWAN University

Chen CL

1

Anal Analyt ytic ical al vs vs.. Numer Numeric ical al ? 

In Anal Analyytica ticall Meth Methods ods, we write necessary conditions an and solve them (analytical or numerical ?) for candidate local minimum designs Some Difficulties:

Number of design variables in constraints can be large  Functions for the design problem can be highly nonlinear  In many applications, cost and/or constraint functions can be implicit in terms of design variables 



Numerical Methods: esti estima mate te an init initia iall desi design gn and and improve it until optimality conditions are satisfied

Chen CL

1

Anal Analyt ytic ical al vs vs.. Numer Numeric ical al ? 

In Anal Analyytica ticall Meth Methods ods, we write necessary conditions an and solve them (analytical or numerical ?) for candidate local minimum designs Some Difficulties:

Number of design variables in constraints can be large  Functions for the design problem can be highly nonlinear  In many applications, cost and/or constraint functions can be implicit in terms of design variables 



Numerical Methods: esti estima mate te an init initia iall desi design gn and and improve it until optimality conditions are satisfied

Chen CL

2

Unconstrained Optimization

Chen CL

3

General Concepts Related to Numerical Algorithms 

A General Algorithm Current estimate:

x(k)

Subproblem 1:

d(k)

:

feasible search direction

Subproblem 2:

αk

:

(positive scalar) step size

⇒ New estimate:

k = 0, 1,

x(k+1) = x(k) + αk d(k)

= x(k) + ∆x(k)

···

Chen CL

4

Chen CL



5

Descent Step Idea current estimate

new estimate

      f (x(k)) > f (x(k+1))

= f (x(k) + αk d(k))

≈

(k)

 ∇  · 

f (x ) +

T

f (x(k))

= f (x(k)) + αk c(k) d(k)

x(k) + αk d(k)

− 

<0

T

∇f (x(k))d(k) = c(k)·d(k) < 0 : descent condition

Angle between c(k) and d(k) must be between 90o and 270o

x(k)

Chen CL



6

Example: check the descent condition f (x) = x 21

− x1x2 + 2x22 − 2x1 + e(

x1 +x2 )

Verify d1 = (1, 2), d2 = (1, 0) at (0, 0) are descent directions or not c =

c d1 =

·

c d2 =

·

 − − −   −      −    2x1

x2

2 + e(x1+x2)

x1 + 4x2 + e

1 1

1 1

1 2 1 0

(x1+x2)

 

=

(0,0)

=

−1+2=1>0

=

− 1 + 0 = −1 < 0

−   1

1

(not a descent dir.)

(a descent dir.)

Chen CL

7

One-Dimensional Minimization: Reduction to A Function of Single Variable Assume:

a descent direction has been found

f (x) = f (x(k) + αd(k))

≈

f (x(k)) + α f T (x(k))d(k)

∇

  

¯ = f (α)

=c·d<0

f (α) < f (0) = f (x(k)) small move reducing f f (0) = c(k) d(k) < 0

·

d should be a descent direction

Chen CL

8

Analytical Method to Compute Step Size 

d(k) is a descent direction

⇒

⇒ α > 0

df (αk ) df 2(αk ) = 0, >0 2 dα dα

df (x(k+1)) df (x(k+1)) dx(k+1) 0= = = dα dx dα

∇f T (x(k+1)) d(k)

  



c(k+1)

T

Gradient of the cost function at NEW point, c(k+1), (k) is orthogonal to the current search direction, d

Chen CL



9

Example: analytical step size determination f (x) = 3x21 + 2x1x2 + 2x22 + 7 d(k) = ( 1, (k)

⇒ (k)

c

c

(k)

·d

(k+1)

x

=

=

=

− −1) ∇f (x(k))

            −  − −   −   −    −   −  10 10 1 2

f (x(k+1)) = 3 ( 1 =

at x(k) = (1, 2)

+ α

6x1 + 2x2

=

2x1 + 4x2

1 1

1 1

=

=

=

x(k)

10 10

20 < 0

1

α

2

α

− α)2 + 2(1 − α)(2 − α) + 2(2 − α)2 + 7 7α2 − 20α + 22 ≡ f (α)

Chen CL

10

df 10 NC: = 14αk 20 = 0 αk = dα 7 d2f = 14 > 0 2 dα 1 1 3/7 10 (k+1) x = + ( 7 ) = 2 1 4/7

−

(k+1)

f (x

⇒

  −  −    −   

54 ) = < 22 = f (x(k)) 7

Chen CL

11

Numerical Methods to Compute Step Size Most one-dimensional search methods work for only unimodal functions (work for α = 0 α α ¯ = αu,) ( αu α interval of uncertainty )

− ≡

≤ ≤

Chen CL

12

Unimodal Function 

Unimodal function: f (x) is one unimodal function if

 

x1 < x2 < x∗ implies f (x1) > f (x2), and x∗ > x3 > x4 implies f (x3) < f (x4)

Chen CL

13

Unimodal Function 

Outcome of two experiments x∗

  

f 1 < f 2 f 1 > f 2 f 1 = f 2

∈ [0, 1],

⇒ x∗ ∈ [0, x2] ⇒ x∗ ∈ [x1, 1] ⇒ x∗ ∈ [x1, x2]

0 < x1 < x2 < 1

Chen CL

14

Equal Interval Search 

To reduce successively the interval of uncertainty, I , to a small acceptable value



I = αu



Evaluate the function at

− α,

(α = 0) δ, 2δ, 3δ,

···

If f ((q + 1)δ ) < f (qδ ) then continue If f ((q + 1)δ ) > f (qδ ) then α = (q

− 1)δ,

αu = (q + 1)δ α∗

∈ [ α, αu ]

Chen CL

15

Chen CL

16

Equal Interval Search: Example

f (x) = x(x 1.5), x∗

−

i xi f (xi)

∈ [0, 1] ⇒ x∗ ∈ [x7, x8] = [0.7, 0.8]

1

2

3

4

5

6

7

8

9

.1

.2

.3

.4

.5

.6

.7

.8

.9

−.14 −.26 −.36 −.44 −.50 −.54 −.56 −.56 −.54

Chen CL

17

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Equal Interval Search: Example

f (α) = 2

− 4α + eα

δ = 0.5

 = 0.001

α

→ αu →

α

→ αu →

α

→ αu →

α

→ αu →

Trial step 0.000000 0.500000 1.000000 1.500000 2.000000 1.050000 1.100000 1.150000 1.200000 1.250000 1.300000 1.350000 1.400000 1.450000 1.355000 1.360000 1.365000 1.370000 1.375000 1.380000 1.385000 1.390000 1.380500 1.381000 1.381500 1.382000 1.382500 1.383000 1.383500 1.384000 1.384500 1.385000 1.385500 1.386000 1.386500 1 387000

Function 3.000000 1.648721 0.718282 0.481689 1.389056 0.657651 0.604166 0.558193 0.520117 0.490343 0.469297 0.457426 0.455200 0.463115 0.456761 0.456193 0.455723 0.455351 0.455077 0.454902 0.454826 0.454850 0.454890 0.454879 0.454868 0.454859 0.454851 0.454844 0.454838 0.454833 0.454829 0.454826 0.454824 0.454823 0.454823 0 454824

δ = 0.5

start from α = 1.0 δ = 0.05

start from α = 1.35 δ = 0.005

start from α = 1.38 δ = 0.0005

Chen CL

18

Equal Interval Search: 3 Interior Points x∗

∈ [a, b] three tests x1, x0, x2 ⇒ three possibilities

Chen CL

19

Equal Interval Search: 2 Interior Points

αa = α + 13 (αu αb =

Case 1: f (αa) < Case 2: f (αa) >

− α) = 13 (αu + 2α) α + 23 (αu − α) = 13 (2αu + α) f (αb) ⇒ α < α∗ < αb f (αb) ⇒ αa < α∗ < αu

I  = 23 I : reduced interval of uncertainty

Chen CL

20

Golden Section Search 

Question of Equal Interval Search (n = 2): known midpoint is not used in next iteration



Solution: Golden Section Search



Fibonacci Sequence: F 0 = 1;

F 1 = 1;

F n = F n−1 + F n−2, n = 2, 3,

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, F n F n−1

→

1.618,

F n−1 F n

→

0.618 as n

···

→∞

··

Chen CL

21

Golden Section Search Initial Bracketing of Minimum Starting at α = 0, q evaluate αq = δ (1.618)j = αq−1 + (1.618)q δ,



q = 0, 1, 2,

j =0

···

q = 0; α0 = δ α1

α0

  −     −     −  

q = 1; α1 = δ + 1.618δ 1.618(α0

− 0) α2

= 2.618δ

α1

q = 2; α2 = 2.618δ + 1.6182δ

= 5.236δ

1.618(α1 − α0 )

α3

α2

q = 3; α3 = 5.236δ + 1.6183δ ..

..

1.618(α2 − α1 )

= 9.472δ

Chen CL

22

Golden Section Search Initial Bracketing of Minimum If

f (αq−2) > f (αq−1) and f (αq−1) < f (αq ) Then αq−2 < α∗ < αq q

αu = αq

=

 

δ (1.618)j

j =0 q 2

α = αq−2 = I = αu

−

−

δ (1.618)j

j =0

α = (1.618)q δ + (1.618)q−1δ

 −   −  αq

αq−1

= 2.618(1.618)q−1δ

αq−1

αq−2

Chen CL

23

Golden Section Search Reduction of Interval of Uncertainty

Given αu, α Select αa, αb s.t. αu αb Suppose

f (αb) > f (αa)  α = α ,  αb

⇒

− − ⇒ −

I = αu αa = τ I, α = τ I, α∗  αb  α

−α



αa αu

−α −α

 b

= (1 = (1

− τ )I − τ )I

[αb, αu], delete [αb, αu]     = αa, αu = αb, I = αu α,     = τ I , αu αb = (1 τ )I

∈

−

−

−

Chen CL

24

Golden Section Search Reduction of Interval of Uncertainty



I = τ I,

⇒ ⇒

τ 2 + τ

−√ 1 = 0

τ = −1+2 αq−1

5

αu αq

− τ )I =

= 0.618 =

αq−2

 −   −  αa

⇒ ⇒

(1



τ I = τ (τ I )

1 1.618

α = 0.382I,

αa = 0.618I = (1.618) 0.382I

αq−1

   αq−1

− αq−2

αu αa αq αq−1 0.618I = = = 1.618 0.382I αa α αq−1 αq−2 ratio of increased trial step size is 1.618

− −

− −

Chen CL

25

Golden Section Search Algorithm = α −2, α ⇒ q, α = α ⇒

= α q , I



Step 1: choose δ



= α  + 0. 0 .382 382II , αb = α = α  + 0. 0 .618 618II , f ( f (αa), f ( f (αb) Step 2: α = α



f (αa), f ( f (αb), go to Step 4, 5, or 6 Step 3: compare f (



f (αa) < f ( f (αb) α < α∗ < αb Step 4: if f ( α = α , αu = α b, αb = α a, αa = α = α  + 0. 0 .382(α 382(αu



f (αa) > f ( f (αb) αa < α∗ < αu Step 5: if f ( α = α a, αu = α u, αa = α b, αb = α  + 0. 0 .618(α 618(αu

⇒

⇒



− α ), go to Step 7 

− α ), go to Step 7

f (αa) = f ( f (αb) αa < α∗ < αb Step 6: if f ( α = α a, αu = α b, return to Step 2

⇒

⇒



u

⇒

⇒



q

Step 7: if I  = α u α <  αu+α ∗ α = 2 and Stop; otherwise return to Step 3

⇒

−



Chen CL

26

Golden Section Search Example

f ((α) = 2 f

− 4α + e + eα

δ = 0.5

 = 0.001

Chen CL

27

Gold Golden en Sect Sectio ion n Sea Search: rch: Exam Exampl ple e f ((x) = x(x f Initial No. 1 2 3 4 5

− 1.5)

Bracketing of Minimum Trial x’s Fcn value 0.000000 0.000000 −0.312500 0.250000 −0.500000 0.500000 −0.562500 0.750000 −0.500000 1.000000

Reducing Interval of Uncertainty No. X Xa 1 0.5000000 0.6910000 −0.5000000 −0.5590190 2 0.6910000 0.7360760 −0.5590190 −0.5623061 3 0.7360760 0.7469139 −0.5623061 −0.5624892 4 0.74691393 0.74922448 −0.562489202 −0.562499399 5 0.7492244890 0.7498169790 −0.562493399 −0.562499967 6 0.7498467900 0.7499566900 −0.562499966 −0.562499998

← X = 0.5 ← f min ← Xu = 1.0 Xb

Xu

I

0.8090000 −0.5590190 0.7639240 −0.5623061 0.7532861 −0.5624892 0.75077551 −0.562499399 0.7501830210 −0.562499967 0.7500431210 −0.562499998

1.0000000 −0.5000000 0.8090000 −0.5590190 0.7639240 −0.5623061 0.75328606 −0.562489202 0.7507755110 −0.562499399 0.7501830210 −0.562499967

0.50000000 0.11800000 0.02784800 0.00657212 0.001551022 0.000366231

Chen CL

28

Polynomial Interpolation Quadratic Curve Fitting q (α) = a0 + a1α + a2α2

(approximated quadratic function)

f (α) = q (α) = a0 + a1α + a2α2 f (αi) = q (αi) = a0 + a1αi + a2α2i f (αu) = q (αu) = a0 + a1αu + a2α2u

⇒

a2 a1 a0

dq (α) dα

⇒



1



f (αu) f (α) f (αi) = αu αi αu α αi f (αi) f (α) = a2(αi + α) αi α = f (α) a1α a2α2

−

− − −

− − − −

= a1 + 2a2α ¯ = 0 ¯ α a1 d2 q α ¯ = if dα 2 = 2a2 > 0 2a2

−

−

− f (α ) −α 





Chen CL



29

Computational Algorithm:

Step  Step  Step  Step 

1: locate initial interval of uncertainty (α, αu) 2: select α < αi < αu f (αi) 3: compute a0, a1, a2, ¯ α, f (¯ α) 4: f (αi) < f (α ¯ ) f (αi) > f (α ¯) α∗ [¯ α, αu] α∗ [α, αi] αi < α ¯ α, αi, ¯ α αi, ¯ α, αu α∗ [α, ¯ α] α∗ [αi, αu] α ¯ < αi α, ¯ αi, αu α, ¯ α, αi Step 5: Stop if two successive estimates of minimum point of f (α) are sufficiently close. Otherwise delete primes on α, αi, αu and return to Step 2

⇒

⇒



⇒

∈

∈

⇒

⇒

∈

∈

Chen CL



30

Example: f (α) = 2

− 4α + e

α

δ = 0.5

α = 0.5 αi = 1.309017

αu = 2.618034

f (α) = 1.648721 f (αi) = 0.466464 f (αu) = 5.236610 a2 = a1 =



−

a0 = 1.648271

−

−

− (−5.821)(0.50) − 2.41(0.25) = 3.957

α ¯ = 1.2077 < αi

⇒



−1.1823 = 2.410 1 3.5879 1.30902 2.1180 0.80902 −1.1823 (2.41)(1.80902) = 5.821 0.80902 f (¯ α) = 0.5149 > f (αi)

α = α ¯ = 1.2077 αu = αu = 2.618034, αi = α i = 1.309017 α = 1.2077

αi = 1.309017 αu = 2.618034

f (α) = 0.5149 f (αi) = 0.466464 f (αu) = 5.236610 a2 = 5.3807

a1 =

−7.30547

α ¯ = 1.3464

f (¯ α) = 0.4579

a0 = 2.713

Chen CL

31

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method 

Conjugate Directions Let A be an n n symmetric matrix. A set of n vectors (directions) S i is said to be A-conjugate if

×

{ }

S T i AS j = 0 for i, j = 1, 

··· , n;

i=j



Note: orthogonal directions are a special case of conjugate directions (A = I )

Chen CL

32


Quadratically Convergent Method If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method



Theorem: Given a quadratic function of n variables and two parallel hyperplanes 1 and 2 of dimensions k < n. Let the constrained stationary points of the quadratic function in the hyperplanes be X 1 and X 2, respectively. Then the line joining X 1 and X 2 is conjugate to any line parallel to the hyperplanes.

Chen CL

33


Proof:

∇

1 T Q(X ) = X AX + B T X + C 2 Q(X ) = AX + B (n 1)

search from a along S search from b along S

⇒

S orthogonal to

S T

X 1

(stationary pt)

X 2

∇Q(X 1) and ∇Q(X 2)

S T Q(X 1) = S T AX 1 + S T B

∇ S ∇Q(X 2) [∇Q(X 1) − ∇Q(X 2)] T

⇒

⇒ ⇒

×

= 0

= S T AX 2 + S T B = 0 = S T A(X 1

− X 2)

= 0

Chen CL

34


Meaning: If X 1 and X 2 are the minima of Q obtained by searching along the direction S from two different starting points X a and X b, respectively, the line (X 1 X 2) will be conjugate to S

−

Chen CL

35

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method Theorem:

If a quadratic function 1 Q(X ) = X T AX + B T X + C 2

Proof:

∇Q(X ∗) S j

β j S j

:

conjugate directions to A

        n

⇒

0

= B + AX 1 + A

β j S j

j =1

0

conjugate directions, the

= S T i (B + AX 1 )+ n

S T i A

minimum of the function Q

β j S j

j =1

will be found at or before the starting point

 j =1

of a set of n mutually

the nth step irrespective of

n

Let X ∗ = X 1 +

is minimized sequentially, once along each direction

= B + AX ∗ = 0

⇒

β i

= (B + AX 1)T S i + β iS T i AS i (B + AX 1)T S i = T

−

S i AS i

Chen CL

36

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method Note: X +1 = X + λ∗S , i = 1, ··· , n i

i

λ∗i

is

i

i

found by minimizing Q(λiS i) so that

0 = S T Q(X i+1) i Q(X i+1) = B + AX i+1 = B + A(X i + λ∗i S i) T ∗ Q(X i+1) = S T 0 = S i i B + A(X i + λi S i )

∇

∇

⇒

⇒

∇

{

}

= (B + AX i)T S i + λ∗i S T i S i (B + AX i)T S i ∗ λi = T

−

X i

= X 1 +

S i i 1

−



AS i

λ∗j S j

j =1

X iT AS i

= X 1T AS i +

T

i 1

−



λ∗j S j AS i = X 1T AS i

j =1

⇒

λ∗i = =

T

−(B + AX ) −(B + AX 1) i

S i

S T i AS i S i T T

S AS i

= β i

Chen CL

37

Powell’s Conjugate Directions: Example f (x1, x2) = 6x21 + 2x22 = if S 1 =

⇒

S T 1 AS 2

= =

λ∗1 =

⇒

X 2

− 6x1x2 − x1 − 2x2

− −       −     −    −   −     ⇒ 1

2

1

0 2

x2

12 6 s1 s2

12

1 + x1 x2 2

6 4

4

x1 x2

0

s1 s2

= 0

S 2 =

   1 −1 −2  2 −   12 −6  1    1 2  −6 4 2

= X 1 + λ∗1 S 1 =

6

6

0

=

X 1

2

1 2

x1

=

  0 0

+

5 1 4 2

1 0

5 4

=

  5/4 5/2

Chen CL

38

  1 −1 −2  0 −   12 −6  1     1 0  −6 4 0 

λ∗2 =

⇒

X 3

= X 2 + λ∗2 S 2 =

=

1 12

   5/4 1 1 + 12 0 5/2

=

  4/3 5/2

= X ∗ (?)

Chen CL

39

Powell’s Algorithm

Chen CL

40

Progress of Powell’s Method

un; u1,

u2,

S (1); u2,

··· , ··· ,

un−1,

S (n−1); un, S (1),

S (2); u3,

···

un−1, un,

S (1);

un,

S (1),

S (2);

S (2),

··· ,

S (n−1)

(un, S (1)), (S (1), S (2)),

⇒

un;

··· ,

un, S (1), S (2),

are A-conjugate

···

···

Chen CL

41

Powell’s Conjugate Directions: Example

Min: f (x1, x2) = x1

− x2 + 2x21 + 2x1x2 + x22

X 1 = [ 0 0 ]T

Chen CL



42

Cycle 1: Univariate Search along u2: f (X 1 + λu2) = f (0, λ) = λ2 df dλ

⇒ along

−u1: ⇒

X 2

f (X 2

− λu1) df dλ

= 0

⇒

λ∗ =

−λ

1 2

= X 1 + λ∗u2 =

  0

0.5

= f ( λ, 0.5) = 2λ2 =

X 3 =

− − 2λ − 0.25 0 ⇒ λ∗ = 12 − 0.5 ∗ X 2 − λ u1 =

    0.5

along u2: f (X 3 + λu2) = f ( 0.5, 0.5 + λ) = λ2 df dλ

⇒

X 4

=

− 0 ⇒

λ∗ =

1 2

= X 1 + λ∗u2 =

− λ − 0.75

−    0.5 1

Chen CL



43

Cycle 2: Pattern Search (1)

S

= X 4

− X 2

=

−    −   −    0.5 1

0

0.5

=

0.5

0.5

f (X 4 + λS (1)) = f ( 0.5 = df dλ

⇒

=

X 5 =

− − 0.5λ, 1 + 0.5λ) 0.25λ2 − 0.5λ − 1 0 ⇒ λ∗ = 1.0 − 1.0 (1) ∗ X + λ S = 4

    1.5

Chen CL

44

Simplex Method

Chen CL

45

Simplex Method

Chen CL

46

Simplex Method

Chen CL

47

Simplex Method

Chen CL

48

Properties of Gradient Vector

∇f (k)

c

=

 

∂f ∂x1

..

∂f ∂xn

 

(k)

= c

= c(x ) =

(k)

∇f (x

) =

  ∂f (x(k)) ∂xi

Chen CL



49

Property 1: The gradient vector c of a function f (x1, , xn) at point x∗ = (x∗1, , x∗n) is orthogonal (normal) to the tangent plane for the surface f (x1, , xn) = constant.

··· ···

 

···

C is any curve on the surface through x∗ T is a vector tangent to curve C at x∗ c T = 0

⇒ ·

Chen CL



50

Proof: s : any parameter along C

 ··· ∂x1 ∂s

T =

∂xn ∂s

⇒

x=x∗

(a unit tangent vector along C at x∗) df = 0 f (x) = constant ds df ∂f ∂x1 ∂f ∂xn 0 = = + + ds ∂x1 ∂s ∂xn ∂s = cT T = c T

⇒ ··· ·

Chen CL

51



Property 2: Gradient represents a direction of maximum rate of increase for f (x) at x∗



Proof: u

:

a unit vector in any direction not tangent to C

t : a parameter along u df f (x + u) = lim →0 dt  ∂f ∂f 2 f (x + u) = f (x) +  u1 ∂x + + u n ∂xn + O( ) 1 f (x + u)

−



n

∂f f (x) =  ui + O(2) ∂xi i=1



df f (x + u) = lim = →  0 dt  =

···

||c|| ||u|| cos θ



1 ( ) 

×

n

∂f ui = c u = cT u ∂xi i=1



·

(max rate of increase when θ = 0)

Chen CL





52

Property 3: The maximum rate of change of f (x) at any point x∗ is the magnitude of the gradient vector

 

(max df dt = c ) u is in the direction of gradient vector for θ = 0

|| ||

Chen CL

53

Verify Properties of Gradient Vector

f (x) = 25x21 + x22, f (x(0)) = 25 c

=

C = t

T

=

=

x(0) = (0.6, 4)

     ∇       √ || ||  −       −    − √ f (0.6, 4) =

c c

=

30 8

302+82

2 ∂ (25x2 1 +x2 =25) ∂s 1 2 ∂ (25x1 +x2 2 =25) ∂s 2

t t

|| ||

=

∂f ∂x 1 ∂f ∂x 2

=

0.966235

=

0.257663 4 15

=

4 15

(−4)2 +152

50x1 2x2

=

.257663 0.966235

=

30 8

Chen CL 

54

Property 1:

C T = 0

·

Slope of tangent: m1 = Slope of gradient: m2 = 

=

c1 c2

=

√ −15−

x1

50x1 2x 2

=

x2 1

=

−3 175 .

30 8

= 3.75

Property 2: choose arbitrary direction D = (0.501034, 0.865430), α = 0.1 (0)

x

= x

x(1)

= x(0) + αD =

(1) C

D

+ αC =

f (x(1) )

= 28.3389

f (x(1) )

= 27.2566

C D



dx2 dx1

Property 3:

<

C C = 1.00

·

   

   

 

0.6 0.966235 + 0.1 4.0 0.257663

=

0.6966235 4.0257663

0.6 0.501034 + 0.1 4.0 0.865430

=

0.6501034 4.0854300

f (x(1) ) C

> C D = 0.7071059

·

Chen CL

55

Steepest Descent Algorithm



Steepest Descent Direction Let f (x) be a differentiable function w.r.t. x. The direction of steepest descent for f (x) at any point is d = c

−



Steepest Descent Algorithm:

Step  Step  Step  Step  Step 

1: a starting design x(0), k = 0,  2: c(k) = f (x(k)); stop if c(k) <  3: d(k) = c(k) 4: calculate αk to minimize f (x(k) + αd(k)) 5: x(k+1) = x(k) + αk d(k), k = k + 1 Step 2

∇ −

|| ||

⇒

Chen CL



56

Notes:  

d =

−c ⇒⇒ c · d = −||c||2 < 0

The successive directions of steepest descent are normal to each other c = = c(k) c(k+1) = 0

−

d

⇒ 

d(k) d(k+1)

·

·

proof: 0 = =

df (x(k+1) ) dα (k+1)

T

        ∂f (x ∂ x

c(k+1) T

)

T

= c(k+1) d(k) = = d(k+1) d(k)

−

·

∂ x(k+1) ∂α

(k )

x(k)+αd

∂ (

)

∂α

− c( +1) · c( ) k

k

Chen CL

57

Steepest Descent: Example f (x1, x2) = x 21 + x22 − 2x1x2 x(0) = (1, 0) 







Step 1: x(0) = (1, 0), k = 0, () Step 2: c(0) = f (x(0)) = (2x1 2x2, 2x2 = (2, 2); c(0) = 2 2 = 0

∇

− || || Step 3: d(0) = −c(0) = (−2, 2)



− 2x1)

Step 4: to minimize f (x(0) + αd(0)) = f (1 f (1

− 2α, 2α) df (α) dα 2 d f (α) dα2



− √

Step 5: x(1) = x (0) + α0d(0) = (1 (1) (0 0) stop

⇒

= = =

− 2α, 2α) (1 − 2α)2 + (2α)2 − 2(1 − 2α)(2α) 16α2 − 8α1 = f (α) 32α − 8 = 0 ⇒ α0 = 0.25

= 32 > 0

− 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)

Chen CL

58

Steepest Descent: Example f (x1, x2, x3) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3 x(0) = (2, 4, 10) x∗ = (0, 0, 0) 





Step 1: k = 0,  = 0.005, (δ = 0.05, ε = 0.0001 for Golden) Step 2: c(0) = f (x(0)) = (2x1 + 2x2, 4x2 + 2x1 + 2x3, 4x3 + 2x2) = (12, 40, 48); c(0) = 4048 = 63.6 > 

∇

√ || || Step 3: d(0) = −c(0) = (−12, −40, −48)



Step 4: to minimize f (x(0) + αd(0)) by Golden



Step 5:

⇒ α0 = 0.158718

x(1) = x (0) + α0d(0) = (0.0954, 2.348, 2.381) c(1) = ( 4.5, 4.438, 4.828); c(1) = 7.952 >  Note: c(1) d(0) = 0 (perfect line search)

− − ·

−

|| ||

Chen CL

59

Chen CL

60

Steepest Descent: Disadvantages 

Slow to converge, especially when approaching the optimum a large number of iterations

⇒



Information calculated at previous iterations is not used, each iteration is started independent of others

Chen CL

61

Scaling of Design Variables



The steepest descent method converges in only one iteration for a positive definite quadratic function with a unit condition number of the Hessian matrix



To accelerate the rate of convergence scale design variables such that condition number of new Hessian

⇒

matrix

is

unity

Chen CL



62

Example: Min: f (x1, x2) = 25x21 + x22 50 0 H = 0 2

 

let x = Dy

⇓

Min: f (y1, y2) = y0

D=

√ √ 

1 2

y12 + y22

= ( 50,

2)

x0 = (1, 1)



√ 1 50

0

0

1 √

2



Chen CL

63

Chen CL 

64

Example: Min: f (x1, x2) = 6x21 H =



12 6

−

− 6x1x2 + 2x22 − 5x1 + 4x2 + 2 −6 4

λ1,2 = 0.7889, 15.211 (eigenvalues) v 1,2 = (0.4718, 0.8817), ( 0.8817, 0.4718)

−

let x = Qy

   −      0.4718 0.8817

Q = v 1 v 2 =

0.8817 0.4718

Min: f (y1, y2) = 0.5(0.7889y12 + 15.211y22) + 1.1678y1 + 6.2957y2 + 2 let y = Dz

D=

1 0.7889

0

√

0

1 15.211

√

Min: f (y1, y2) = 0.5(z12 + z22) + 1.3148z1 + 1.6142z2 x0 = ( 1, x∗ =

− −2) ⇒ z∗ = (−1.3158, −1.6142) QDz ∗ = (− 13 , − 23 )

Chen CL

65

Conjugate Gradient Method Fletcher and Reeves (1964) 

Steepest Descent: orthogonal at consecutive steps converge but slow

⇒



Conjugate Gradient Method: modify current steepest descent direction by adding a scaled previous direction

⇒ cut diagonally through orthogonal steepest descent directions 

Conjugate Gradient Directions: d(i), d( j) orthogonal w.r.t. a symmetric and positive definite matrix A (i)T d Ad( j) = 0

Chen CL



66

Conjugate Gradient Method: algorithm

Step 1: k = 0, x(0) f (x(0)) d(0) = c(0) = Stop if c(0) < , otherwise go to Step 4 (k)  Step 2: c = f (x(k)), Stop if c(k) <  (k) (k−1) (k) (k) (k−1) 2  Step 3: d = c +β k d , β k = c / c (k)  Step 4: compute αk = α to minimize f (x + αd(k)) (k+1)  Step 5: x = x(k) + αd(k), k = k + 1, go to Step 2

⇒ || || ∇ −





−

−∇ || || || || ||



||

Note:

Find the minimum in n iterations for positive definite quadratic forms having n design variables  Inexact line search, non-quadratic forms re-started every n + 1 iterations for computational stability (x(0) = x(n+1)) 

⇒

Chen CL



67

Example:

Min: f (x) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3 c(0) = (12, 40, 48); x(1) = (0.0956, c(1) =

β 1 = d(1) =

=

||c(0)|| = 63.6;

x(0) = (2, 4, 10)

f (x(0)) = 332.0

−2.348, 2.381) (−4.5, −4.438, 4.828); ||c(1)|| = 7.952; f (x(1)) = 10.75 2 (1) (0) ||c ||/||c || = [7.952/63.3]2 = 0.015633 −c(1) + β 1d(0) −12 4.500 4.31241 4.438 + (0.015633) −40 = 3.81268 −4.828 −48 −5.57838

  

 



 

 

 

 

Chen CL

68

 −

0.0956

x(2) = x(1) + αd(1)

=

⇒

x(2) = c(2) =

Note:

⇒ α = 0.3156 (1.4566, −1.1447, 0.6205) (0.6238, −0.4246, 0.1926), c(2) · d(1) = 0

  −

4.31241

2.348 + α

2.381

Min: f (x(1) + αd(1))

 

3.81268

 

5.57838

||c(2)|| = 0.7788

Chen CL

69

Newton Method A Second-order Method x : current estimate of x∗

⇒

x∗

≈

x + ∆x (desired)

f (x + ∆x) = f (x) + cT ∆x + 12 ∆xT H ∆x

NC:

⇒ ⇒

∂f ∂ ∆x

= c + H ∆x = 0

∆x = ∆x =

−H −1c −αH −1c (modified)

Chen CL



70

Steps: (modified)

Step 1: k = 0; c(0);  (k) ∂f (x(k) )  Step 2: c = ∂xi , i = 1 i 

(k)

  ∂ 2 f ∂xi∂xj 1 (k)



Step 3: H (x

)=



Step 4: d(k) =

−H − c

∼ n; Stop if ||c( )|| <  k

or

Hd(k) =

− c( ) k

Note: for computational efficiency, a system of linear simultaneous eqns is solved instead of evaluating the inverse of Hessian

Step 5: compute αk = α to minimize f (x(k) + αd(k)) (k+1) = (k) + α (k)  Step 6: x x d , k = k + 1, go to Step 2 



Note: unless H is positive definite, d(k) will not be that of descent for f H > 0

⇔

(k)T (k)

c

d

=

(k)T

−αk c

H −1c(k) < 0

   > 0 for positive H

Chen CL



71

Example: f (x) = 3x21 + 2x1x2 + 2x22 + 7 x(0) = (5, 10); c

(0) (0)

H

 = 0.0001

= (6x1 + 2x2, 2x1 + 4x2) = ( 5 0, 50); =

d(0) =

  6 2 2 4

,

 − − −  −    −  − −    −   −−  − − ·    − − −   −  − −

(0)−1

H

−H −1c(0) =

x(1) = x(0) + αd(0) = df dα

∇f (x(1)) ∇f (x(1)) · d(0)

= 0 or = = =

 

6(5

√ ||c || = 50 2 (0)

∇f (x(1))

4 2 = 2 6 2 50 5 1 4 = 20 2 6 50 10 5 5 5 5α + α = 10 10 10 10α 1 20

d(0) = 0

− 5α) + 2(10 10α) = 2(5 − 5α) + 4(10 10α) 5 50 − 50α 50 50α −10

50

50α

50

50α

−5(50 − 50α) − 10(50 − 50α) = 0 ⇒

α=1

Chen CL

72

(1)

x

c

(1)

=

=

 

5 10

− 5α − 10α

50 50

− 50α − 50α

    =

0 0

=

0 0

Chen CL



73

Example: f (x) = 10x41 c = H =

− 20x21x2 + 10x22 + x21 − 2x1 + 5, x(0) = (−1, 3) ∇f (x) = ( 40x31 − 40x1x2 + 2x1 −2, −20x21 + 20x2) ∇2f (x) = 120x −−40x40x + 2 −40x 20 2 1

2

1

1

Chen CL

74

Chen CL



75

Comparison of Steepest Descent, Conjugate Gradient Methods

f (x) = 50(x2

− x21)2 + (2 − x1)2

x(0) = (5,

−5)

Newton,

x∗ = (2, 4

Chen CL

76

Chen CL

77

Chen CL

78

Newton Method 

Advantage: quadratic convergent rate



Disadvantages:

Calculation of second-order derivatives at each iteration A system of simultaneous linear equations needs to be solved  Hessian of the function may be singular at some iterations  Memoryless method: each iteration is started afresh  Not convergent unless Hessian remains positive definite and a step size determination scheme is used 

Chen CL

79

Marquardt Modification (1963) d(k) =

 

−(H + λI )−1c(k) Far away solution point ⇒ use Steepest Descent Near the solution point ⇒ use Newton Method

Step 1: k = 0; c(0); ; λ (= 10000) (large) (k) ∂f (x(k) ) (k)  Step 2: c = 1 Stop c , i = n; if < i ∂xi 



(k)

Step 3: H (x

=

 ∼ ∂ 2 f ∂xi ∂xj

)

|| ||

Step 4: d(k) = (H + λk I )−1c(k) (k)  Step 5: if f (x + d(k)) < f (x(k)), go to Step 6 Otherwise, let λk = 2λk and go to Step 4  Step 6: Set λk+1 = 0.5λk , k = k + 1 and go to Step 2 

−

Chen CL

80

Quasi-Newton Methods 

Steepest Descent:

Use only 1st-order information poor rate of convergence  Each iteration is started with new design variables without using any information from previous iterations

⇒





Newton Method:



Use 2nd-order derivatives

⇒ quadratic convergence rate

Requires calculation of n(n2+1) 2nd-order derivatives !  DIfficulties if Hessian is singular  Not learning processes 

Chen CL

81

Quasi-Newton Methods 

Quasi Newton Methods, Update Methods:

Use first-order derivatives to generate approximations for Hessian combine desirable features of both steepest descent and Newton’s methods  Use information from previous iterations to speed up convergence (learning processes)  Several ways to approximate (updated) Hessian or its inverse  Preserve properties of symmetry and positive definiteness 

Chen CL

82

Davidon-Fletcher-Powell (DFP) Method 

Davidon (1959), Fletcher and Powell (1963)



To approximate Hessian inverse using only first derivatives

−αH −1c ≈ −αAc

∆x = A :

find A by using only 1st-order information

Chen CL



83

DFP Procedures:

Step  Step  Step  Step  Step  Step 

A

≈

H −1

1: k = 0; c(0), ; A(0)(= I , H −1) 2: c(k) = f (x(k)), Stop if c(k) <  3: d(k) = A(k)c(k) 4: compute αk = α to minimize f (x(k) + αd(k)) 5: x(k+1) = x(k) + αk d(k) 6: update A(k)

≈ || ||

∇ −

A(k+1) = A(k) + B (k) + C (k) B

(k)

s(k)

(k) (k) T

= ss(k)s·y (k) = αk d(k)

y (k) = c(k+1) c(k+1) = 

(k)

C

(k)T

= −yz(k)·zz (k)

(change in design)

− c( ) k

∇f (x( +1)) k

(k)

(change in gradient) z (k) = A(k)y (k)

Step 7: set k = k + 1 and go to Step 2

Chen CL



84

DFP Properties:



Matrix A(k) is always positive definite always converge to a local minimum if α > 0

⇒



d f (x(k) + αd(k)) dα 

α=0

=

−c

(k)T

A(k)c(k) < 0

When applied to a positive definite quadratic form, A(k) converges to inverse of Hessian of the quadratic form

Chen CL



85

DFP Example: f (x) = 5x21 + 2x1x2 + x22 + 7

1-1.

x(0) = (1, 2);

A(0) = I ;

x(0) = (1, 2)

k = 0,  = 0.001

c(0) = (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

1-2. 1-3. 1-4.

(0)

||c ||

=

d(0) =

x(1) =

√ 142 + 62 = 15.232 >  −c(0) = (−14, −6) x(0) + αd(0) = (1 − 14α, 2 − 6α)

f (x(1)) = f (α) = 5(1 − 14α)2 + 2(1 − 14α)(2 − 6α) + 2(2 − 6α)2 + 7 df dα

= 5(2)(−14)(1 − 14α) + 2(−14)(2 − 6α) + 2(−6)(1 − 14α) +2( 6)(2

−

α = 0.0988,

1-5.

− 6α) = 0

d2f = dα2

x(1) = x(0) + α0d(0) = (1

2348 > 0

− 14α, 2 − 6α) = (−0.386, 1.407)

Chen CL

86

s(0) = α0d(0) = ( 1.386,

1-6.

y (0) = s(0) y (0) =

·

(0) (0)T

s

s

B

(0)

(1)

A

=

=

−0.593), c(1) = (−1.406, 2.042 c(1) − c(0) = (−15.046, −3.958), z (0) = y (0) 23.20, y (0) · z (0) = 242.05 −

   

 

1.921 0.822

z

0.822 0.352

0.0828 0.0354 0.0354 0.0152 (0)

= A

+B

(0)

 

(0)

+ C

(0) (0)T

z

(0)

C

=

 −

=

0.148

=

 

226.40 59.55

− −

0.211

   

59.55

0.935 0.246

−0.246 −0.065

 

−0.211 0.950

15.67

Chen CL

2-2. 2-3. 2-4.

87

(1)

||c ||

=

d(1) =

√ 1.0462 + 2.0422 = 2.29 >  −A(1)c(1) = (0.586, −1.719)

x(2) = x(1) + αd(1)

α1 = 0.776 (minimize f (x(1) + αd(1)))

2-5.

x(2) = x(1) + αd(1) = ( 0.386, 1.407) + (0.455,

= (0.069, 0.073)

−

−1.334)

Chen CL

88

s(1) = α1d(1) = (0.455,

2-6.

y (1) = z (1) = s(1) y (1) =

·

(1) (1)T

s

s

=

B (1) =

A(2) =

−1.334), c(2) = (0.836, 0.284) c(2) − c(1) = (1.882, −1.758) A(1)y (1) = (0.649, −2.067) 3.201, y (1) · z (1) = 4.855 0.207 −0.607 0.421 −1.341 (1) (1) z z = −0.607 1.780 −1.341 4.272 − .0647 −0.19 .0867 0.276 (1) C = −0.19 0.556 0.276 −0.880 0.126 −0.125 (1) (1) (1) A + B + C = −0.125 0.626

   

 

 

T

 

 

 

 

 

Chen CL

89

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method 

Direct update Hessian using only first derivatives

−αH −1c = −αc ≈ −αc

∆x = H ∆x A∆x

A :

find A by using only 1st-order information

Chen CL



90

BFGS Procedures:

Step  Step  Step  Step  Step  Step 

1: k = 0; c(0), ; H (0)(= I , H ) 2: c(k) = f (x(k)), Stop if c(k) <  3: solve H (k)d(k) = c(k) to obtain d(k) 4: compute αk = α to minimize f (x(k) + αd(k)) 5: x(k+1) = x(k) + αk d(k) 6: update H (k)

∇

≈ || ||

−

H (k+1) = H (k) + D(k) + E (k) D

(k)

s(k)

T

y (k)y (k) = y(k)·s(k) = αk d(k)

y (k) = c(k+1) c(k+1) = 

(k)

E

T

(k ) (k ) c = (k)c (k) c ·d

(change in design)

− c( ) k

(change in gradient)

∇f (x( +1)) k

Step 7: set k = k + 1 and go to Step 2

Chen CL



91

BFGS Example: f (x) = 5x21 + 2x1x2 + x22 + 7

1-1.

x(0) = (1, 2);

H (0) = I ;

x(0) = (1, 2)

k = 0,  = 0.001

c(0) = (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

1-2. 1-3. 1-4.

(0)

||c ||

=

d(0) =

x(1) =

√ 142 + 62 = 15.232 >  −c(0) = (−14, −6) x(0) + αd(0) = (1 − 14α, 2 − 6α)

f (x(1)) = f (α) = 5(1 − 14α)2 + 2(1 − 14α)(2 − 6α) + 2(2 − 6α)2 + 7 df dα

= 5(2)(−14)(1 − 14α) + 2(−14)(2 − 6α) + 2(−6)(1 − 14α) +2( 6)(2

−

α = 0.0988,

1-5.

− 6α) = 0

d2f = dα2

x(1) = x(0) + α0d(0) = (1

2348 > 0

− 14α, 2 − 6α) = (−0.386, 1.407)

Chen CL

92

s(0) = α0d(0) = ( 1.386,

1-6.

y (0) = y (0) s(0) =

·

y

(0) (0)T

y

D

(0)

(1)

H

=

=

−0.593), c(1) = (−1.406, 2.042 c(1) − c(0) = (−15.046, −3.958) 23.20, c(0) · d(0) = −232.0 −

   

   

226.40 59.55 59.55

(0) (0)T

c

15.67

9.760 2.567

(0)

E

2.567 0.675 (0)

= H

+D

(0)

c

(0)

+ E

=

 

=

=

− −

 

196 84 84

0.845 0.362

 

9.915 2.205 2.205 0.520

36

 

 

−0.362 −0.155

Chen CL

2-2. 2-3. 2-4.

93

(1)

||c || −c(1)

= =

√ 1.0462 + 2.0422 = 2.29 >  H (1)d(1) ⇒ d(1) = (17.20, −76.77)

x(2) = x(1) + αd(1)

α1 = 0.018455 (minimize f (x(1) + αd(1)))

2-5.

x(2) = x(1) + α1d(1) = ( 0.0686,

−

−0.0098)

5 Numerical Methods for Unconstrained Optimization.pdf

Recommend Documents