7 Numerical Methods for Unconstrained Optimization.pdf

Chen CL

1

Analyti Ana lytical cal vs. Num Numeric erical al ?

Numerical Methods for Unconstrained Optimization

➢

Analytical Methods: write necessary conditions and solve them (analytical or numerical ?) ?) fo forr candid candidat atee local minimum designs Some Difficulties:

Number of design variables in constraints can be large ☞ Functions for the design problem can be highly nonlinear ☞ In many applications, cost and/or constraint functions can be implicit in terms of design variables ☞

Cheng-Liang Chen

PSE

LABORATORY

➢

Department of Chemical Engineering National TAIWAN University

Chen CL

2

General Concepts Related to Numerical Algorithms A General Algorithm Current estimate:

x(k)

Subproblem 1:

( k)

:

feasible search direction

Subproblem 2:

αk

:

(positive scalar) scalar) step size size

⇒ New estimate:

(k+1)

d x

k = 0, 1,

= x(k) + αkd(k) = x(k) + ∆x(k)

···

Numerical Nume rical Meth Methods ods: es esti tima mate te an in init itia iall de desi sign gn an and d it until optimality conditions are satisfied improve it

Chen CL

3

General Concepts Related to Numerical Algorithms

Chen CL

4

Chen CL

5



Descent Step Idea current estimate

Example: check the descent condition f (x) = x 21

new estimate

      = f f ((x

Taylor

≈

(k )

+ αk d )

 ∇  · 

f ((x(k)) + f

T

f (x(k))

x(k) + αk d(k)

= f f ((x(k)) + α + αk c(k) d(k)

− 

∇f (x(k))d(k) = c(k)·d(k) < 0 : descent condition (k)

Angle between c

(k )

and d

o

· ·

o

6

Unconstrained Uncons trained Optimiz Optimization ation

c d1 =

c d2 =

must be between 90 and and 2 270 70

Chen CL

c =

x(k)

<0

T

x1+x2)

Verify d1 = (1 (1,, 2) 2),, d2 = (1 (1,, 0) at at (0 (0,, 0) 0) are are descent directions or not

f ((x(k)) > f f f ((x(k+1)) (k )

− x1x2 + 2x 2 x22 − 2x1 + + e e(

 − − − −    −    2x1

x2

2 + e + e

(x1+x2)

(x1+x2)

x1 + 4x 4 x2 + + e e 1

1 1

2 1

1 1

0

 

= (0,0)

=

− 1 + 2 = 1 > 1 > 0 0

=

− 1 + 0 = −1 < < 0 0

−   1

1

(not a descent descent dir.) dir.)

(a descent descent dir.)

Chen CL

7

One-Dimensional Minimization: Reduction to A Function of Single Variable

Assume: a descent direction has been found f ((x) f

= Taylor

f (α) f (0)

f ((x(k) + αd(k)) f

≈

f ((x(k)) + α f + α f T (x(k))d(k)

<

f ((0) = f f f ((x(k))

=

( k)

∇   =c

c(k) d

·

< 0

·d<0

= f ¯(α x(k))

|

⇒ (small move reducing f f ))

d should be a descent direction

Chen CL

8

Analytical Method to Compute Step Size ➢

⇒

9

Example: analytical step size determination f (x) = 3x21 + 2x1x2 + 2x22 + 7

(k)

d

Chen CL ➢

is a descent direction α > 0 2 df (αk ) df (αk ) = 0, > 0 dα dα2

⇒

d(k) = ( 1,

df (x(k+1)) df (x(k+1)) dx(k+1) 0= = = dα dx dα T

·

f T (x(k+1)) d(k)

∇  

c(k+1) d(k) = c (k+1) d(k) = 0

c(k) =

⇒

c(k) d(k) =

·

T c(k+1)

x(k+1) =

c(k+1), is orthogonal to the current search direction, d(k)

⇒ Gradient of the cost function at NEW point,

10

df 10 NC: = 14αk 20 = 0 αk = dα 7 2 d f = 14 > 0 dα2 1 1 3/7 + ( 10 ) = x(k+1) = 7 2 1 4/7

−

f (x(k+1)) =

∇f (x(k+1))

=

∇f T (x(k+1))d(k)

=

⇒

  −  −    −    −     −  −  54 < 22 = f (x(k)) 7 10 7 10 7

10 7

10 7

1

−1

= 0 (check)

at x(k) = (1, 2)

          −  −   −−   −    −   −  10 10 1 2

+ α

6x1 + 2x2

=

2x1 + 4x2

1 1

1 1

=

=

=

x(k)

10 10

20 < 0

1

α

2

α

f (x(k+1)) = 3 ( 1 =

Chen CL

− −1) ∇f (x(k))

− α)2 + 2(1 − α)(2 − α) + 2(2 − α)2 + 7 7α2 − 20α + 22 ≡ f (α)

Chen CL

11

Numerical Methods to Compute Step Size Most one-dimensional search methods work for only unimodal functions (work for α = 0 α α = α ¯ u,) (αu α interval of uncertainty )

− ≡

≤ ≤

Chen CL

12

Chen CL

Unimodal Function

Unimodal Function ➢

➢

Unimodal function: f (x) is one unimodal function if

☞ ☞

x1 < x2 < x∗ implies f (x1) > f (x2), and x∗ > x3 > x4 implies f (x3) < f (x4)

Outcome of two experiments x∗

☞ ☞ ☞

Chen CL

14

Equal Interval Search ➢

To reduce successively the interval of uncertainty , I , to a small acceptable value

➢

I = α u

➢

Evaluate the function at

− α ,

(α = 0) α = 0, δ, 2δ, 3δ,

If f ((q + 1)δ ) < f (qδ )

··· , αu

then continue

If f ((q + 1)δ ) > f (qδ ) then α = (q

⇑ new pt

⇑ current pt

− 1)δ,

αu = (q + 1)δ α∗

13

∈ [ α, αu ]

∈ [0, 1], 0 < x1 < x2 < 1 f 1 < f 2 ⇒ x∗ ∈ [0, x2] f 1 > f 2 ⇒ x∗ ∈ [x1, 1] f 1 = f 2 ⇒ x∗ ∈ [x1, x2]

Chen CL

15

Chen CL

16

Chen CL

Equal Interval Search: Example Equal Interval Search: Example f (α) = 2

− 4α + eα

δ = 0.5

 = 0.001

Note: f (x) = x(x 1.5), x∗

−

i

xi f (xi)

∈ [0, 1] ⇒ x∗ ∈ [x7, x8] = [0.7, 0.8]

1

2

3

4

5

6

7

8

9

.1

.2

.3

.4

.5

.6

.7

.8

.9

use 99 points

⇒ eliminate 98% ⇒ eliminate ∼ 1%

per function evaluation

−.14 −.26 −.36 −.44 −.50 −.54 −.56 −.56 −.54

Chen CL

18

Equal Interval Search: 3 Interior Points x∗

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

α

→ →

αu

α

→ →

αu

α

→ →

αu

α

→ →

αu

Trial step 0.0000 00 0.5000 00 1 .0000 00 1.5000 00 2 .0000 00 1.050000 1.100000 1.150000 1.200000 1.250000 1.300000 1 .350000 1.400000 1 .450000 1.355000 1.360000 1.365000 1.370000 1.375000 1 .380000 1.385000 1 .390000 1.380500 1.381000 1.381500 1.382000 1.382500 1.383000 1.383500 1.384000 1.384500 1.385000 1.385500 1 .386000 1.386500 1 .387000 1.3865 00

Function 3.000000 1.648721 0.718282 0.481689 1.389056 0.657651 0.604166 0.558193 0.520117 0.490343 0.469297 0.457426 0.455200 0.463115 0.456761 0.456193 0.455723 0.455351 0.455077 0.454902 0.454826 0.454850 0.454890 0.454879 0.454868 0.454859 0.454851 0.454844 0.454838 0.454833 0.454829 0.454826 0.454824 0.454823 0.454823 0.454824 0.454823

17 δ = 0.5

start from α = 1.0 δ = 0.05

start from α = 1.35 δ = 0.005

start from α = 1.38 δ = 0.0005

Chen CL

19

Equal Interval Search: 2 Interior Points

∈ [a, b] three tests x1, x0, x2 ⇒ three possibilities

αa = α + 31 (αu αb = α +

2 3 (αu

Case 1: f (αa) < f (αb) Case 2: f (αa) > f (αb) I  = 32 I 2 points 3 points (2 are new)

eliminate 25% per function evaluation !!

Why ?

:

⇒ ⇒

− α ) = 31 (α + 2α ) − α ) = 31 (2α + α ) 

u





u



α < α∗ < αb αa < α∗ < αu

reduced interval of uncertainty

⇒ eliminate 16.7% per function evaluation ?! “old” point is NOT used

Chen CL

20

Chen CL

21

Golden Section Search

Golden Section Search Reduction of Interval of Uncertainty

➢

Question of Equal Interval Search (n = 2): known midpoint is NOT used in next iteration

➢

Solution: Golden Section Search

➢

Fibonacci Sequence: F 0 = 1;

F 1 = 1;

F n = F n−1 + F n−2, n = 2, 3,

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, F n F n−1

→

1.618,

F n−1 F n

→

0.618 as n

···

Given αu, α Select αa, αb s.t. αu αb Suppose f (αb) > f (αa)  α = α ,  αb

→∞

Chen CL

⇒

22

− − ⇒ −

I = αa = α = α∗  αb =  α =

∈

αu τI,

−α



αa

α = (1 τ )I τI, αu αb = (1 τ )I [αb, αu], delete [αb, αu]     αa, αu = αb, I = αu α,     τ I , αu αb = (1 τ )I

− −

− −

−

−

Chen CL


Reduction of Interval of Uncertainty

Initial Bracketing of Minimum

 I = τ I,  (1 τ )I = τ I = τ (τ I ) τ 2 + τ 1 = 0

−

−√

τ = − 1+2 5 = 0.618 = 1 τ = 0.382

1 1.618

−

Q : fine initial three points ? αq−1

αq−2

 −   −  αa αu αq

αu αa

− −

α = 0.382I, αa = 0.618I = (1.618) 0.382I αq−1 − αq−2 αq−1 αa αq αq−1 0.618I = = = 1.618 α αq−1 αq−2 0.382I

− −

  

−

23


⇒ ⇒

⇒

···

Starting at α = 0, q evaluate αq = δ (1.618)j = αq−1 + (1.618)q δ,

 j =0

q = 0, 1, 2,

· ··

q = 0; α0 = δ α1

α0

  −     −     −  

q = 1; α1 = δ + 1.618δ 1.618(α0 − 0) α2

= 2.618δ

α1

q = 2; α2 = 2.618δ + 1.6182δ = 5.236δ 1.618(α1 − α0 ) α3

α2

q = 3; α3 = 5.236δ + 1.6183δ = 9.472δ 1.618(α2 − α1 )

Chen CL

If

24

Chen CL

25



Initial Bracketing of Minimum

Algorithm

f (αq−2) > f (αq−1) and f (αq−1) < f (αq )

Then αq−2 < α∗ < αq

➢

Step 1: choose δ

➢

Step 2: αa = α + 0.382I, αb = α + 0.618I, f (αa), f (αb)

➢

Step 3: compare f (αa), f (αb), go to Step 4, 5, or 6

⇒ q, α = α −2, α 

u = α q , I

q

q

αu = αq

 

=

δ (1.618)j

➢

α = αq−2 = I = αu

➢

δ (1.618)j

Step 5: if f (αa) > f (αb) αa < α∗ < αu α = α a, αu = αu, αa = α b, αb = α  + 0.618(αu

⇒

⇒

j =0

α = (1.618)q δ + (1.618)q−1δ αq − αq−1 αq−1 − αq−2 q −1 = 2.618(1.618) δ

−

⇒

⇒

j =0 q 2

−

Step 4: if f (αa) < f (αb) α < α∗ < αb α = α , αu = α b, αb = α a, αa = α + 0.382(αu

➢

      26

Golden Section Search: Example

− 4α + e

−

Chen CL

27

Golden Section Search: Example

α

δ = 0.5

No.

2

Table 5.2 Golden Section Search for f (α) = 2 4α + eα

−

3 4

Initial Bracketing of Minimum Trial step 1 2 3 4

α1 αq αu

→ → →

0.000 0.500 1.309 2.618

000 000 071 034

Function value 3.000 1.648 0.466 5.236

000 721 464 610

− 4α + eα,

δ = 0.5,

 = 0.001

Reducing Interval of Uncertainty 1

 = 0.001



Step 7: if I  = α u α <  α +α α∗ = u 2  and Stop; otherwise return to Step 3

f (α) = 2 f (α) = 2

− α ), go to Step 7

⇒

⇒

Chen CL



Step 6: if f (αa) = f (αb) αa < α∗ < αb   α = α a, αu = αb, return to Step 2

⇒

➢

− α ), go to Step 7

5 6 7 8

αl ; [f (αl )]

αa ; [f (αa )]

αb ; [f (αb )]

αu ; [f (αu)]

I

0.500 000

1.309 017

1.809 017

2.618 034

2.118 034

[1.648 721]

[0.466 464]

[0.868 376]

[5.236 610]

0.500 000

1.000 000

1.309 017

1.809 017

[1.648 721]

[0.718 282]

[0.466 464]

[0.868 376]

1.000 000

1.309 017

1.500 000

1.809 017

[0.718 282]

[0.466 464]

[0.481 689]

[0.868 376]

1.000 000

1.190 983

1.309 017

1.500 000

[0.718 282]

[0.526 382]

[0.466 464]

[0.481 689]

1.190 983

1.309 017

1.381 966

1.500 000

[0.526 382]

[0.466 464]

[0.454 860]

[0.481 689]

1.309 017

1.381 966

1.427 051

1.500 000

[0.466 464]

[0.454 860]

[0.458 190]

[0.481 689]

1.309 017

1.354 102

1.381 966

1.427 051

[0.466 464]

[0.456 873]

[0.454 860]

[0.458 190]

1.354 102 [0

3]

1.381 966 [0

0]

1.399 187 [0

6]

1.427 051 [0

0]

1.309 017 0.809 017 0.500 000 0.309 017 0.190 983 0.118 304 0.072 949

Chen CL

32

Polynomial Interpolation

Chen CL ➢

Quadratic Curve Fitting

33

Computational Algorithm:

Step 1: locate initial interval of uncertainty (α, αu) Step 2: select α  < αi < αu f (αi) ☞ Step 3: compute a 0 , a1 , a2 , ¯ α, f (¯ α) ☞ Step 4: ☞

⇒

☞

q (α) = a0 + a1α + a2α2

(approximated quadratic function)

f (α) = q (α) = a0 + a1α + a2α2 f (αi) = q (αi) = a0 + a1αi + a2α2i f (αu) = q (αu) = a0 + a1αu + a2α2u

⇒

1

f (αu) f (α) f (αi) = αu αi αu α αi f (αi) f (α) = a2(αi + α) αi α = f (α) a1α a2α2

a2

−

a1 a0 dq (α) dα

⇒



− − −

− − − −

−

− f (α ) −α 



 ☞



= a1 + 2a2α ¯ = 0 α ¯ a1 d2 q α ¯ = if dα 2 = 2a2 > 0 2a2

−

Chen CL ➢

34

− 4α + e

α

f (αi) > f (¯ α)

¯ αi < α

∈ [¯α, αu] α∗ 

α∗

α ¯ < αi

∈ [α, ¯α] α∗ 

∈ [αi, αu] α∗ 

⇒

α, αi, ¯ α

⇒

α ¯ , αi , αu

⇒

⇒

∈ [α , α ] 

i

αi, ¯ α, αu α, ¯ α, αi

Step 5: Stop if two successive estimates of minimum point of f (α) are sufficiently close. Otherwise delete primes on α , αi, αu and return to Step 2

Chen CL

35

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method

Example: f (α) = 2

f (αi) < f (¯ α)

δ = 0.5

α = 0.5 αi = 1.309017 αu = 2.618034 f (α) = 1.648721 f (αi) = 0.466464 f (αu) = 5.236610 a2 = a1





➢

1 3.5879 1.1823 1.30902 2.1180 0.80902 = 2.410 1.1823 (2.41)(1.80902) = 5.821 0.80902

= −

−

a0 = 1.648271

−−

×

−

− (−5.821)(0.50) − 2.41(0.25) = 3.957

α ¯ = 1.2077 < αi f (¯ α) = 0.5149 > f (αi)

⇒

f (α) = 0.5149 f (αi) = 0.466464 f (αu) = 5.236610 a2 = 5.3807 a1 = α ¯

1 3464 f (α ¯)

−7.30547 0 4579

a0 = 2.713

{ }

S T i AS j = 0 for i, j = 1,

α = α ¯ = 1.2077 αu = αu = 2.618034, αi = α i = 1.309017 α = 1.2077 αi = 1.309017 αu = 2.618034

Conjugate Directions Let A be an n n symmetric matrix. A set of n vectors (directions) S i is said to be A-conjugate if

➢

··· , n;

i = j



Note: orthogonal directions are a special case of conjugate directions (A = I )

Chen CL

36

Chen CL

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method ➢

37

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method

Quadratically Convergent Method If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method

➢

Proof: 1 T X AX + B T X + C 2 Q(X ) = AX + B (n 1) Q(X ) =

∇

search from a along S search from b along S

➢

Theorem: Given a quadratic function of n variables and two parallel hyperplanes 1 and 2 of dimensions k < n. Let the constrained stationary points of the quadratic function in the hyperplanes be X 1 and X 2, respectively. Then the line joining X 1 and X 2 is conjugate to any line parallel to the hyperplanes.

Chen CL

38

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method ➢

Meaning: If X 1 and X 2 are the minima of Q obtained by searching along the direction S from two different starting points X a and X b, respectively, the line (X 1 X 2) will be conjugate to S

−

⇒

⇒

S T

⇒ ⇒

S orthogonal to

X 1

×

(stationary pt)

X 2

∇Q(X 1) and ∇Q(X 2)

T

= S T AX 1 + S T B = 0

T

= S T AX 2 + S T B = 0

∇Q(X 1) S ∇Q(X 2) [∇Q(X 1) − ∇Q(X 2)] S

= S T A(X 1

− X 2)

= 0

Chen CL

39

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method Theorem:

If a quadratic function 1 Q(X ) = X T AX + B T X + C 2

Proof:

∇Q(X ∗)



β j S j

j =1

S j

:

conjugate directions to A

        n

⇒

of a set of n mutually

0

= B + AX 1 + A

β j S j

j =1

0

conjugate directions, the

= S T i ( B + AX 1 )+ n

S T i A

minimum of the function Q

β j S j

j =1

will be found at or before the nth step irrespective of

n

Let X ∗ = X 1 +

is minimized sequentially, once along each direction

= B + AX ∗ = 0

⇒

β

= (B + AX 1)T S i + β iS T i AS i (B + AX 1)T S i

Chen CL

40

Multi-Dimensional Minimization: Powell’s Conjugate Directions Method Note: X +1 = X + λ∗S , i = 1, ·· · , n i

i

i

Chen CL

41

Powell’s Conjugate Directions: Example f (x1, x2) = 6x21 + 2x22

i

=

λ∗i is found by minimizing Q(λiS i) so that 0 = S T Q(X i+1) i Q(X i+1) = B + AX i+1 = B + A(X i + λ∗i S i) ∗ 0 = S T Q(X i+1) = S T i i B + A(X i + λi S i )

∇

∇

⇒

⇒

∇

{

}

= (B + AX i) S i + λ∗i S T i S i λ∗i =

X i

−

T

if S 1 =

⇒

(B + AX i)T S i

S T 1 AS 2

=

S T i AS i i 1

= X 1 +

−



=

λ∗j S j

j =1

T

X i AS i

T

= X 1 AS i +

λ∗1 =

i 1

−



λ∗j S j T AS i = X 1T AS i

j =1

⇒

λ∗i = =

i

S i S T i AS i S i T S T AS i i

T

−(B + AX ) −(B + AX 1)

⇒

= β i

Chen CL

42

Powell’s Conjugate Directions: Example

λ∗2 =

−1 −21 −   −0    12 6 1 1 0  −6

⇒

X 3

= X 2 + λ∗2 S 2 =

4

=

1 12

0 5/4 1 1 + 12 0 5/2

  

=

  4/3 5/2

= X ∗ (?)

X 2

6x1x2 x1 2x2 x1 12 6 x1 1 + x1 x2 2 2 x2 6 4 x2 0 X 1 = 0 12 6 s1 6 4 s2 s1 1 S 2 = = 0 s2 0 1 1 2 2 5 = 4 12 6 1

− − −  − −   −     −    −      −  ⇒ 1

1 2

1 2 0 2

− −   −  −       1 2  −6 4   2

= X 1 + λ∗1 S 1 =

  0 5 1 + 4 2 0

Chen CL

=

  5/4 5/2

43

Powell’s Algorithm

Chen CL

44

Chen CL

Progress of Powell’s Method

un; u1,

u2,

(1)

S ; u2,

· ··

S

− ;

un−1, un,

S (1);

un−1, un,

S (1),

S (2);

un, S (1),

S (2),

··· ,

S (n−1)

(un, S (1)), (S (1), S (2)), un, S (1), S (2),

⇒

are A-conjugate

···

Powell’s Conjugate Directions: Example

un;

··· , ··· ,

S (2); u3, (n 1)

· ·· ,

Min: f (x1, x2) = x1

46

Cycle 1: Univariate Search

−u1: ⇒

⇒

λ∗ =

X 2 = X 1 + λ∗u2

⇒ along

= 0

f (X 2

− λu1) df dλ

−λ

1 2

=

S (1) = X 4

  0

− X 2

=

=

    0.5

=

λ∗ =

X 4 = X 1 + λ∗u2

1 2

=

− λ − 0.75

−    0.5

−    −   −    0.5 1

0

0.5

=

0.5

0.5

f (X 4 + λS ) = f ( 0.5

− − 2λ − 0.25 1 ∗ 0 ⇒ λ = 2 −0.5 X 2 − λ∗u1 = − 0 ⇒

=

(1)

0.5

along u2: f (X 3 + λu2) = f ( 0.5, 0.5 + λ) = λ2

⇒

47

Cycle 2: Pattern Search

= f ( λ, 0.5) = 2λ2

X 3 =

df dλ

Chen CL ➢

along u2: f (X 1 + λu2) = f (0, λ) = λ2 df dλ

− x2 + 2x21 + 2x1x2 + x22

X 1 = [ 0 0 ]T

···

Chen CL ➢

45

df dλ

⇒

=

X 5 =

− − 0.5λ, 1 + 0.5λ) 0.25λ2 − 0.5λ − 1 0 ⇒ λ∗ = 1.0 −1.0 X + λ∗S (1) = 4

    1.5

Chen CL

48

Chen CL

Simplex Method

Chen CL

Simplex Method

50

Simplex Method

49

Chen CL

51

Simplex Method

Chen CL

52

Chen CL

53

Properties of Gradient Vector

Properties of Gradient Vector ➢

··· ···

f (x) = f (x1, x2, . . . , xn)

∇f (x) c

(k)

=

 

∂f (x) ∂x 1

..

∂f (x) ∂x n

 

= c

(k)

= c(x ) =

Property 1: The gradient vector c of a function f (x1, , xn) at point x∗ = (x∗1, , x∗n) is orthogonal (normal) to the tangent plane for the surface f (x1, , xn) = constant.

(k)

∇f (x

) =

  ∂f (x(k)) ∂x i

☞ ☞

Chen CL

54

C is any curve on the surface through x∗ T is a vector tangent to curve C at x∗ c T = 0

Proof: s : any parameter along C

 ··· ∂x 1 ∂s

T =

55


Property 2: Gradient represents a direction of maximum rate of increase for f (x) at x∗

➢

Proof: u

∂x n ∂s

⇒

⇒ ·

Chen CL


x=x∗ (a unit tangent vector along C at x∗) df f (x) = constant = 0 ds df ∂f ∂x1 ∂f ∂xn 0 = = + + ds ∂x1 ∂s ∂xn ∂s T = c T = c T

⇒ ··· ·

···

:

a unit vector in any direction not tangent to C

t : a parameter along u df f (x + u) f (x) = lim 0 →  dt  ∂f ∂f f (x + u) = f (x) +  u1 ∂x + + un ∂x + O(2) n 1

−

f (x + u)

−



f (x) = 

···

n



ui

i=1

∂f + O(2) ∂x i

df f (x + u) = lim = →0 dt  =

||c|| ||u|| cos θ

n

 i=1

ui



(

×1 )

∂f = c u = cT u ∂x i

·

(max rate of increase when θ = 0)

Chen CL

56

Chen CL


➢

Verify Properties of Gradient Vector

Property 3: The maximum rate of change of f (x) at any point x∗ is the magnitude of the gradient vector

  df dt

57

f (x) = 25x21 + x22, f (x(0)) = 25 c

(max = c ) u is in the direction of gradient vector for θ = 0

|| ||

C = t

T

Chen CL

58

·

➢

Slope of gradient: m2 = ➢

dx2 dx1 c1 c2

= =

√ −15−

x1

50x1 2x2

=

x2 1

=

−3 175 .

30 8

=

f (0.6, 4) =

c c

=

30 8

302+82

2 ∂ (25x2 1+x2 =25) ∂s 1 2 ∂ (25x2 1+x2 =25) ∂s 2

t

||t||

=

∂f ∂x 1 ∂f ∂x 2

50x1 2x2

=

0.966235 0.257663

=

4 15

4 15

( 4)2 +152

−

=

=

=

30 8

.257663 0.966235

59

Steepest Descent Algorithm

Property 1: C T = 0 Slope of tangent: m1 =

=

     ∇       √ || ||  −    −      − √

Chen CL

Verify Properties of Gradient Vector ➢

=

x(0) = (0.6, 4)

= 3.75

Property 2: choose arbitrary direction D = (0.501034, 0.865430), α = 0.1

➢

Steepest Descent Direction Let f (x) be a differentiable function w.r.t. x. The direction of steepest descent for f (x) at any point is d = c

−

Steepest Descent Algorithm:

Step 1: a starting design x(0), k = 0,  (k) ☞ Step 2: c = f (x(k)); stop if c(k) <  (k ) ☞ Step 3: d = c(k) (k) ☞ Step 4: calculate αk to minimize f (x + αd(k)) (k+1) ☞ Step 5: x = x (k) + αkd(k), k = k + 1 Step 2 ☞

x(1) C

= x(0) + αC =

x(1)

= x(0) + αD =

D

f (x(1))

= 28.3389

f (x(1))

= 27.2566

C D

<

   

   

0.6 0.966235 + 0.1 4.0 0.257663 0.6 0.501034 + 0.1 4.0 0.865430

f (x(1)) C

= =

 

0.6966235 4.0257663 0.6501034 4.0854300

∇ −

|| ||

⇒

Chen CL

➢

60

☞

− ⇒⇒ ·

−|| ||

c = = c(k) c(k+1) = 0

−

d

➢

Steepest Descent: Example x(0) = (1, 0) f (x1, x2) = x21 + x22 − 2x1x2

d = c c d = c 2 < 0 The successive directions of steepest descent are normal to each other

⇒

d(k) d(k+1)

·

➢

Step 1: x(0) = (1, 0), k = 0, ()

➢

Step 2: c(0) =

·

➢

proof:

➢

df (x(k+1)) dα (k+1)

0 = =

61

Steepest Descent Algorithm

Notes: ☞

Chen CL

T

        ∂f (x ∂ x

c(k+1)

)

T

x(k)+αd

∂ (

(k)

df α dα d2f (α) dα2

)

∂α

T

= c(k+1) d(k) = d(k+1) d(k) =

−

∂ x(k+1) ∂α

∇f (x(0)) = (2x 1 − 2x 2 − 2x1 ) √ 2, 2x = (2, −2); ||c(0)|| = 2 2  =0 Step 3: d(0) = −c(0) = (−2, 2) Step 4: to minimize f (x(0) + αd(0)) = f (1 − 2α, 2α) f (1 − 2α, 2α) = (1 − 2α)2 + (2α)2 − 2(1 − 2α)(2α) = 16α2 − 8α = f (α) ( ) = 32α − 8 = 0 ⇒ α0 = 0.25

·

− c( +1) · c( ) k

k

➢

Step 5: x(1) = x (0) + α0d(0) = (1 c(1) = (0, 0) stop

⇒

Chen CL

62

63

Steepest Descent: Example Optimum solution for Example 5.10 with steepest descent program NO.

1

2.00000E + 00

x∗ = (0, 0, 0)

2 3

Step 1: k = 0,  = 0.005, (δ = 0.05, ε = 0.0001 for Golden)

5 6 7

➢

Step 2: c(0) =

∇f (x(0)) = (2x1(0) + 2x2,√ 4x2 + 2x1 + 2x3, 4x3 + 2x2) = (12, 40, 48); ||c || = 4048 = 63.6 > 

➢

Step 3: d(0) =

➢

Step 4: to minimize f (x(0) + αd(0)) by Golden

➢ Step 5:

x1

x(0) = (2, 4, 10)

4 ➢

8 9 10 11

−c(0) = (−12, −40, −48)

12 13

⇒ α0 = 0.158718

14 15 16

x(1) = x (0) + α0d(0) = (0.0954, 2.348, 2.381) c(1) = ( 4.5, 4.438, 4.828); c(1) = 7.952 > 

− −

−

|| ||

− 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)

Chen CL

Steepest Descent: Example f (x1, x2, x3) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3

= 32 > 0

17 18

x2

4.00000E + 00

x3

f (x)

α

6.36239E + 01

9.53870E

2.38155E + 00 1.07503E + 01

7.95922E + 00

1.47384E + 00

9.04562E

− 02 −2.34871E + 00 −9.90342E − 01 1.29826E + 00 −1.13477E + 00 1.08573E + 00 −6.61514E − 01 9.24028E − 01 −7.63036E − 01 7.34684E − 01 −4.92294E − 01 6.40697E − 01 −5.48401E − 01 5.35008E − 01 −3.46139E − 01 4.61478E − 01 −3.89014E − 01 3.78902E − 01 −2.49307E − 01 3.28464E − 01 −2.78695E − 01 2.71519E − 01 −1.77281E − 01 2.34872E − 01 −1.98704E − 01 1.93449E − 01 −1.26669E − 01 1.67477E − 01 −1.41872E − 01 1.38196E − 01 −9.03973E − 02 1.19599E − 01 −1.01263E − 01

− 01 6.07228E − 01 5.03640E − 01 3.71842E − 01 3.94601E − 01 2.79474E − 01 2.67396E − 01 1.93950E − 01 1.95219E − 01 1.40291E − 01 1.38132E − 01 9.96396E − 02 9.89806E − 02 7.12541E − 02 7.05267E − 02 5.08180E − 02

1.05936E + 00 6.73757E

− 01 4.58534E − 01 3.17218E − 01 2.24007E − 01 1.58946E − 01 1.13373E − 01 8.09174E − 02 5.78310E − 02 4.13141E − 02 2.94940E − 02 2.10499E − 02 1.50233E − 02 1.07197E − 02 7.65362E − 03 5.46341E − 03

− 01 3.05872E − 01 1.81571E − 01 6.49989E − 01 1.90588E − 01 5.88053E − 01 1.93877E − 01 5.71554E − 01 1.94660E − 01 5.69767E − 01 1.94601E − 01 5.72088E − 01 1.94439E − 01 5.72650E − 01 1.94457E − 01 5.71777E − 01 1.94534E − 01 5.70832E − 01

 c 

1.00000E + 01 3.32000E + 02 1.58718E

2.06142E + 00 8.13910E

− 01

1.21729E + 00 5.63154E

− 01 − 01 3.99141E − 01 5.77545E − 01 2.84837E − 01 4.11895E − 01 2.03339E − 01 2.94710E − 01 1.45112E − 01 2.10430E − 01 1.03580E − 01 1.50077E − 01 7.39559E − 02 8.19377E

Chen CL

64

Chen CL

Steepest Descent: Example NO.

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

x1

9.86659E

x2

− 02 8.54114E − 02 7.04412E − 02 6.09761E − 02 5.02296E − 02 4.34774E − 02 3.58170E − 02 3.09971E − 02 2.55704E − 02 2.21338E − 02 1.82492E − 02 1.57951E − 02 1.30274E − 02 1.12762E − 02 9.29691E − 03

−6.46051E − 02 −7.23289E − 02 −4.60867E − 02 −5.16211E − 02 −3.28470E − 02 −3.68093E − 02 −2.34188E − 02 −2.62487E − 02 −1.67348E − 02 −1.87414E − 02 −1.19396E − 02 −1.33752E − 02 −8.52432E − 03 −9.54793E − 03 −6.08244E − 03

f (x)

x3

5.03927E

− 02 3.63134E − 02 3.59726E − 02 2.59230E − 02 2.56647E − 02 1.84853E − 02 1.83000E − 02 1.31757E − 02 1.30584E − 02 9.40925E − 03 9.32100E − 03 6.71412E − 03 6.65348E − 03 4.79362E − 03 4.74860E − 03

3.90156E

− 03 2.78693E − 03 1.98946E − 03 1.41991E − 03 1.01241E − 03 7.21938E − 04 5.14807E − 04 3.67049E − 04 2.62104E − 04 1.87131E − 04 1.33549E − 04 9.53054E − 05 6.80463E − 05 4.85693E − 05 3.46611E − 05

α

1.94571E

− 01 5.72147E − 01 1.94320E − 01 5.74372E − 01 1.94224E − 01 5.74409E − 01 1.94379E − 01 5.71430E − 01 1.94475E − 01 5.72554E − 01 1.94475E − 01 5.71873E − 01 1.94439E − 01 5.72520E − 01 1.94283E − 01

65

Steepest Descent: Disadvantages  c 

1.07016E

− 01 − 02 7.65397E − 02 3.76650E − 02 5.46915E − 02 2.68589E − 02 3.90103E − 02 1.91685E − 02 2.77637E − 02 1.36815E − 02 1.98348E − 02 9.76661E − 03 1.41534E − 02 6.97001E − 03 1.01055E − 02 5.28078E

➢

Slow to converge, especially when approaching the optimum

⇒ a large number of iterations ➢

Information calculated at previous iterations is NOT used, each iteration is started independent of others

Optimum design variables : 8.04787E

− 03 , −6.81319E − 03 , 3 .42174E − 03 − 05 Norm of gradient at optimum : 4.97071E − 03 Optimum cost function value : 2.47347E

Total no. of function evaluations : 753

Chen CL

66

Chen CL

67

Scaling of Design Variables ➢

Example: Min: f (x1, x2) = 25x21 + x22 H =

➢

The steepest descent method converges in only one iteration for a positive definite quadratic function with a unit condition number of the Hessian matrix

let x

  50 0 0 2

= Dy

⇓

D =

 

Min: f (y1, y2) = 21 y12 + y22 y 0 = ( 50, 2)

√ √

➢

To accelerate the rate of convergence scale design variables such that

⇒

x0 = (1, 1)



√ 150

0

0

√ 12



Chen CL

68

Chen CL ➢

69

Example: Min: f (x1, x2) = 6x21 H =



12

−6

− 6x1x2 + 2x22 − 5x1 + 4x2 + 2 −6 4

λ1,2 = 0.7889, 15.211 (eigenvalues) v 1,2 = (0.4718, 0.8817), ( 0.8817, 0.4718)

let x = Qy

 −  −     

Q = v 1 v2 =

0.4718 0.8817

0.8817 0.4718

Min: f (y1, y2) = 0.5(0.7889y12 + 15.211y22) + 1.1678y1 + 6.2957y2 + 2 let y = Dz

D =

√ 0.17889

0

0

√ 151.211

Min: f (y1, y2) = 0.5(z12 + z22) + 1.3148z1 + 1.6142z2

− −2) ⇒ z∗ = (−1.3158, −1.6142) QDz ∗ = (− 13 , − 23 )

x0 = ( 1, x∗ =

Chen CL

70

Conjugate Gradient Method Fletcher and Reeves (1964) ➢

Steepest Descent: orthogonal at consecutive steps converge but slow

⇒ ➢

Conjugate Gradient Method: modify current steepest descent direction by adding a scaled previous direction

⇒ cut diagonally through orthogonal steepest descent directions ➢

Conjugate Gradient Directions: d(i), d( j) orthogonal w.r.t. a symmetric and positive definite matrix A (i) T

d

( j)

Ad

0

Chen CL ➢

71

Conjugate Gradient Method: algorithm

Step 1: k = 0, x(0) d(0) = c(0) = f (x(0)) (0) Stop if c < , otherwise go to Step 4 (k) (k ) ☞ Step 2: c = f (x ), Stop if c(k) <  2 (k ) ☞ Step 3: d = c(k)+β k d(k−1), β k = c(k) / c(k−1) (k) ☞ Step 4: compute α k = α to minimize f (x + αd(k)) (k) (k+1) (k) ☞ Step 5: x = x + αkd , k = k + 1, go to Step 2

⇒ || || ∇ −

☞

➢

−

−∇ || || || || ||



||



Note:

Find the minimum in n iterations for positive definite quadratic forms having n design variables ☞ Inexact line search, non-quadratic forms re-started every n + 1 iterations for computational stability (n+1) ( (0) ) ☞

⇒

Chen CL ➢

72

Chen CL

73

Example:

 −

0.0956

(2)

x

(1)

= x

(1)

+ αd

=

+ αd

(2)

=

(2)

=

Min: f (x Min: f (x) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3 c(0) = (12, 40, 48); x(1) = (0.0956, c(1) =

β 1 = d(1) =

=

||c(0)|| = 63.6;

⇒

x(0) = (2, 4, 10)

f (x(0)) = 332.0

 



 

 

 

74

Newton Method A Second-order Method x : current estimate of x∗ x∗

≈

x + ∆x (desired) T

f (x + ∆x) = f (x) + c ∆x +

NC:

⇒ ⇒

∂f ∂ ∆x

1 ∆xT H ∆x 2

= c + H ∆x = 0

∆x = ∆x =

−H −1c −αH −1c (modified)

⇒ α = 0.3156 (1.4566, −1.1447, 0.6205) (0.6238, −0.4246, 0.1926), c(2) · d(1) = 0

3.81268

 

5.57838

||c(2)|| = 0.7788

 

Chen CL

⇒

c

Note:

−2.348, 2.381) (−4.5, −4.438, 4.828); ||c(1)|| = 7.952; f (x(1)) = 10.75 ||c(1)||/||c(0)|| 2 = [7.952/63.3]2 = 0.015633 −c(1) + β 1d(0) −12 4.500 4.31241 4.438 + (0.015633) −40 = 3.81268 −48 −5.57838 −4.828

  

x

)

  −

4.31241

2.348 + α

2.381

(1)

(1)

 

Chen CL ➢

75

Steps: (modified)

Step 1: k = 0; c(0);  (k ) x(k)) ☞ Step 2: ci = ∂f (∂x , i = 1 i ☞

☞

Step 3: H (x(k)) =

☞

Step 4: d(k) =

 ∼

n; Stop if c(k) < 

|| ||

2

∂ f ∂x i∂x j 1 (k )

−H − c

or

Hd(k) =

−c( ) k

Note:for computational efficiency, a system of linear simultaneous eqns is solved instead of evaluating the inverse of Hessian

Step 5: compute α k = α to minimize f (x(k) + αd(k)) (k+1) ☞ Step 6: x = x (k) + αd(k), k = k + 1, go to Step 2 ☞

➢

Note: unless H is positive definite, d(k) will not be that of descent for f H > 0

⇔

T

c(k) d(k) =

−αk c(k)T H −1c(k) < 0

   0 for positive H

Chen CL ➢

76

f (x) = (0)

c

(0)

d

 = 0.0001

= (6x1 + 2x2, 2x1 + 4x2) = ( 5 0, 50);

H (0) =

  6 2 2 4

,

H (0)

−1

∇f (x(1)) ∇f (x(1)) · d(0)

= 0 or = = =

 

6(5 2(5

  − 201 −42 −62   

||c(0)|| = 50√ 2

∇f (x(1))

5 10

4 2

2 6

+ α

5 10

− 5α − 10α

=

0 0

50 50

− 50α − 50α

=

0 0

d(0) = 0

− 5α) + 2(10 10α) = − 5α) + 4(10 10α) 5 50 − 50α 50 50α −10

50 50

50α 50α

−5(50 − 50α) − 10(50 − 50α) = 0 ⇒

α = 1 78

Example: f (x) = 1 0x41 H =

   

50 5 = 50 10 5 5 5α = 10 10 10α

Chen CL

c =

c(1) =

 

 −  −  −  −   −−  − − ·    − − −   − − −

1 = −H −1c(0) = − 20

x(1) = x(0) + αd(0) = df dα

=

77

x(1) =

3x21 + 2x1x2 + 2x22 + 7

x(0) = (5, 10);

➢

Chen CL

Example:

− 20x21x2 + 10x22 + x21 − 2x1 + 5, x(0) = (−1, 3) ∇f (x) = ( 40x31 − 40x1x2 + 2x1 −2, −20x21 + 20x2) ∇2f (x) = 120x −−40x40x + 2 −40x 20 2 1

2

1

1

Chen CL

79

Chen CL ➢

80

Comparison of Steepest Descent, Conjugate Gradient Methods

f (x) = 50(x2

Chen CL

− x21)2 + (2 − x1)2

x(0) = (5,

−5)

Chen CL

81

Chen CL

83

Newton,

x∗ = (2, 4

82

Newton Method ➢

Advantage: quadratic convergent rate

➢

Disadvantages:

Calculation of second-order derivatives at each iteration A system of simultaneous linear equations needs to be solved ☞ Hessian of the function may be singular at some iterations ☞ Memoryless method: each iteration is started afresh ☞ Not convergent unless Hessian remains positive definite and a step size determination scheme is used ☞

Chen CL

84

Chen CL

Quasi-Newton Methods

Marquardt Modification (1963) ➢

d(k) =

➢

➢

−(H + λI )−1c(k) Far away solution point ⇒ use Steepest Descent Near the solution point ⇒ use Newton Method

☞

➢

Step 1: k = 0; c(0); ; λ (= 10000) (large) (k) x(k)) ☞ Step 2: ci = ∂f (∂x , i = 1 n; Stop if c(k) <  i ☞

Step 3: H (x(k) =

  ∂ 2f ∂x i ∂x j

)

−

Chen CL

⇒

Use 2nd-order derivatives

⇒ quadratic convergence rate

Requires calculation of n(n2+1) 2nd-order derivatives ! ☞ DIfficulties if Hessian is singular ☞ Not learning processes ☞

86

Chen CL

Quasi-Newton Methods ➢

Use only 1st-order information poor rate of convergence Each iteration is started with new design variables without using any information from previous iterations

Newton Method:

☞

|| ||

Step 4: d(k) = (H + λk I )−1c(k) (k) ☞ Step 5: if f (x + d(k)) < f (x(k)), go to Step 6 Otherwise, let λk = 2λk and go to Step 4 ☞ Step 6: Set λk +1 = 0.5λk , k = k + 1 and go to Step 2 ☞

Steepest Descent:

☞

☞

∼

85

Quasi Newton Methods, Update Methods:

Use first-order derivatives to generate approximations for Hessian combine desirable features of both steepest descent and Newton’s methods ☞ Use information from previous iterations to speed up convergence (learning processes) ☞ Several ways to approximate (updated) Hessian or its inverse ☞ Preserve properties of symmetry and positive definiteness

87

Davidon-Fletcher-Powell (DFP) Method ➢

Davidon (1959), Fletcher and Powell (1963)

➢

To approximate Hessian inverse using only first derivatives

☞

−αH −1c ≈ −αAc

∆x = A :

find A by using only 1 st-order information

Chen CL ➢

88

DFP Procedures:

A

H −1

≈

(0)

Step 1: k = 0; c(0), ; A (= I , H −1) (k) ☞ Step 2: c = f (x(k)), Stop if c(k) <  (k ) ☞ Step 3: d = A(k)c(k) (k ) ☞ Step 4: compute α k = α to minimize f (x + αd(k)) (k+1) ☞ Step 5: x = x (k) + αk d(k) (k) ☞ Step 6: update A ☞

T

y

(k )

(k+1)

= c

c(k+1) = ☞

(k )



d f (x(k) + αd(k)) dα ☞

= α=0

k T

−c( )

A(k)c(k) < 0

When applied to a positive definite quadratic form, A(k) converges to inverse of Hessian of the quadratic form

(k) T

C (k) = − z(k) z (k) y ·z

(change in design)

( k)

(change in gradient)

−c

z(k) = A(k)y (k)

∇f (x( +1)) k

Step 7: set k = k + 1 and go to Step 2

Chen CL ➢

Matrix A(k) is always positive definite always converge to a local minimum if α > 0

⇒

A(k+1) = A(k) + B (k) + C (k) (k ) ( k ) B (k) = s (k)s (k) s ·y s(k) = αkd(k)

89

DFP Properties:

☞

≈ || ||

∇ −

Chen CL ➢

90

Chen CL

91

DFP Example: f (x) = 5x21 + 2x1x2 + x22 + 7

1-1.

x(0) = (1, 2); (0)

1-2. 1-3. 1-4.

A(0) = I ;

x(0) = (1, 2)

k = 0,  = 0.001

c

= (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

(0)

=

||c || (0)

d

=

x(1) =

y (0) =

√ 142 + 62 = 15.232 >  −c(0) = (−14, −6) x(0) + αd(0) = (1 − 14α, 2 − 6α)

s(0) y (0) =

·

s(0)s(0)

f (x(1)) = f (α) = 5(1 − 14α)2 + 2(1 − 14α)(2 − 6α) + 2(2 − 6α)2 + 7 df dα

= 5(2)(−14)(1 − 14α) + 2(−14)(2 − 6α) + 2(−6)(1 − 14α) +2( 6)(2

−

− 6α) = 0

d2f = dα2

α = 0.0988,

1-5.

(1)

(0)

+

(0)

d

(1

2348 > 0 14

2

s(0) = α0d(0) = ( 1.386,

1-6.

T

=

B (0) =

−0.593), c(1) = (−1.406, 2.042) c(1) − c(0) = (−15.046, −3.958), z (0) = y (0) y(0) · z (0) = 242.05 23.20, −

   

 

1.921 0.822

T

z (0)z (0) =

0.822 0.352

 

0.0828 0.0354 0.0354 0.0152

A(1) = A(0) + B (0) + C (0) =

6 )

( 0 386 1 407)

C (0) =

 −

0.148

 

− −

0 211

   

226.40 59.55 59.55

0.935 0.246

−0.246 −0.065

 

−0.211 0 950

15.67

Chen CL

92

Chen CL

93

s(1) = α1d(1) = (0.455,

2-6.

y (1) = z (1) = s(1) y (1) =

·

s(1)s(1)

2-2. 2-3.

(1)

||c ||

2-4.

(1)

d

= =

T

=

B (1) =

√ 1.0462 + 2.0422 = 2.29 >  −A(1)c(1) = (0.586, −1.719)

A(2) =

x(2) = x(1) + αd(1)

α1 = 0.776 (minimize f (x(1) + αd(1)))

2-5.

x(2) = x(1) + αd(1) = ( 0.386, 1.407) + (0.455,

= (0.069, 0.073)

−

94

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Direct update Hessian using only first derivatives

−αH −1c = −αc ≈ −αc

∆x = H ∆x A∆x

A :

   

 

 

T

 

 

 

 

 

 

−1.334)

Chen CL

➢

−1.334), c(2) = (0.836, 0.284) c(2) − c(1) = (1.882, −1.758) A(1)y (1) = (0.649, −2.067) y(1) · z (1) = 4.855 3.201, 0.207 −0.607 0.421 −1.341 z(1)z (1) = −0.607 1.780 −1.341 4.272 −.0867 0.276 .0647 −0.19 C (1) = −0.19 0.556 0.276 −0.880 0.126 −0.125 A(1) + B (1) + C (1) = −0.125 0.626

Chen CL ➢

95

BFGS Procedures:

Step 1: k = 0; c(0), ; H (0)(= I , H ) (k) ☞ Step 2: c = f (x(k)), Stop if c(k) <  (k) (k) d = c(k) to obtain d(k) ☞ Step 3: solve H (k) ☞ Step 4: compute α k = α to minimize f (x + αd(k)) ( ) k (k+1) ☞ Step 5: x = x (k) + αkd (k ) ☞ Step 6: update H

≈

☞

∇

|| ||

−

H (k+1) = H (k) + D (k) + E (k) T

find A by using only 1 st-order information

s(k)

y (k) = c(k+1) (k+1)

c ☞

T

y (k)y (k) y (k)·s(k) = αk d(k)

D (k ) =

=

(change in design)

− c( )

(k+1)

∇f (x

(k ) (k ) E (k) = c c (k) c(k)·d

k

(change in gradient)

)

Step 7: set k = k + 1 and go to Step 2

Chen CL ➢

96

Chen CL

97

BFGS Example: f (x) = 5x21 + 2x1x2 + x22 + 7 H (0) = I ;

x(0) = (1, 2);

1-1.

x(0) = (1, 2)

k = 0,  = 0.001

c(0) = (10x1 + 2x2, 2x1 + 2x2) = (14, 6), (0)

1-2.

||c || (0)

1-3.

d

= =

x(1) =

1-4.

s(0) = α0d(0) = ( 1.386,

1-6.

y (0) =

√ 142 + 62 = 15.232 >  −c(0) = (−14, −6) x(0) + αd(0) = (1 − 14α, 2 − 6α)

y (0) s(0) =

·

y (0)y (0)

f (x(1)) = f (α) = 5(1 − 14α)2 + 2(1 − 14α)(2 − 6α) + 2(2 − 6α)2 + 7 df dα

−

− 6α) = 0

d2f = dα2

α = 0.0988, (1)

1-5.

x

(0)

= x

=

D(0) =

= 5(2)(−14)(1 − 14α) + 2(−14)(2 − 6α) + 2(−6)(1 − 14α) +2( 6)(2

T

(0)

+ α0d

= (1

2348 > 0

·

y (1)y (1)

T

=

D(1) =

2-4.

√ 1.0462 + 2.0422 = 2.29 >  H (1)d(1) ⇒ d(1) = (17.20, −76.77)

x(2) = x(1) + αd(1)

α1 = 0.018455 (minimize f (x(1) + αd(1)))

2-5.

(2)

(1)

+

d(1)

15.67

9.760 2.567

E (0) =

2.567 0.675

( 0 0686

0 0098)

 

− −

196 84 84

0.845 0.362

 

9.915 2.205

36

 

−0.362 −0.155

2.205 0.520

s(1) = α1d(1) = (0.317, y (1) s(1) =

=

59.55

T

c(0)c(0) =

99

y (1) =

2-3.

226.40 59.55

   

Chen CL

2-6.

=

   

− 14α, 2 − 6α) = (−0.386, 1.407) 98

||c(1)|| −c(1)

   

H (1) = H (0) + D(0) + E (0) =

Chen CL

2-2.

−0.593), c(1) = (−1.406, 2.042) c(1) − c(0) = (−15.046, −3.958) c(0) · d(0) = −232.0 23.20, −

−1.417), c(2) = (−0.706, −0.157) c(2) − c(1) = (0.340, −2.199) c(1) · d(1) = −174.76 3.224, 0.1156 −0.748 1.094 −2.136 c(1)c(1) = −0.748 4.836 −2.136 4.170 −.0063 .0122 0.036 −0.232 E (1) = −0.232 1.500 .0122 −.0239

   

   

H (2) = H (1) + D (1) + E (1) =

T

 

   

 

9.945 1.985 1.985 1.996

   

7 Numerical Methods for Unconstrained Optimization.pdf

Recommend Documents