Chen CL
1
Analyti Ana lytical cal vs. Num Numeric erical al ?
Numerical Methods for Unconstrained Optimization
➢
Analytical Methods: write necessary conditions and solve them (analytical or numerical ?) ?) fo forr candid candidat atee local minimum designs Some Difficulties:
Number of design variables in constraints can be large ☞ Functions for the design problem can be highly nonlinear ☞ In many applications, cost and/or constraint functions can be implicit in terms of design variables ☞
Cheng-Liang Chen
PSE
LABORATORY
➢
Department of Chemical Engineering National TAIWAN University
Chen CL
2
General Concepts Related to Numerical Algorithms A General Algorithm Current estimate:
x(k)
Subproblem 1:
( k)
:
feasible search direction
Subproblem 2:
αk
:
(positive scalar) scalar) step size size
⇒ New estimate:
(k+1)
d x
k = 0, 1,
= x(k) + αkd(k) = x(k) + ∆x(k)
···
Numerical Nume rical Meth Methods ods: es esti tima mate te an in init itia iall de desi sign gn an and d it until optimality conditions are satisfied improve it
Chen CL
3
General Concepts Related to Numerical Algorithms
Chen CL
4
Chen CL
5
General Concepts Related to Numerical Algorithms
General Concepts Related to Numerical Algorithms
Descent Step Idea current estimate
Example: check the descent condition f (x) = x 21
new estimate
= f f ((x
Taylor
≈
(k )
+ αk d )
∇ ·
f ((x(k)) + f
T
f (x(k))
x(k) + αk d(k)
= f f ((x(k)) + α + αk c(k) d(k)
−
∇f (x(k))d(k) = c(k)·d(k) < 0 : descent condition (k)
Angle between c
(k )
and d
o
· ·
o
6
Unconstrained Uncons trained Optimiz Optimization ation
c d1 =
c d2 =
must be between 90 and and 2 270 70
Chen CL
c =
x(k)
<0
T
x1+x2)
Verify d1 = (1 (1,, 2) 2),, d2 = (1 (1,, 0) at at (0 (0,, 0) 0) are are descent directions or not
f ((x(k)) > f f f ((x(k+1)) (k )
− x1x2 + 2x 2 x22 − 2x1 + + e e(
− − − − − 2x1
x2
2 + e + e
(x1+x2)
(x1+x2)
x1 + 4x 4 x2 + + e e 1
1 1
2 1
1 1
0
= (0,0)
=
− 1 + 2 = 1 > 1 > 0 0
=
− 1 + 0 = −1 < < 0 0
− 1
1
(not a descent descent dir.) dir.)
(a descent descent dir.)
Chen CL
7
One-Dimensional Minimization: Reduction to A Function of Single Variable
Assume: a descent direction has been found f ((x) f
= Taylor
f (α) f (0)
f ((x(k) + αd(k)) f
≈
f ((x(k)) + α f + α f T (x(k))d(k)
<
f ((0) = f f f ((x(k))
=
( k)
∇ =c
c(k) d
·
< 0
·d<0
= f ¯(α x(k))
|
⇒ (small move reducing f f ))
d should be a descent direction
Chen CL
8
Analytical Method to Compute Step Size ➢
⇒
9
Example: analytical step size determination f (x) = 3x21 + 2x1x2 + 2x22 + 7
(k)
d
Chen CL ➢
is a descent direction α > 0 2 df (αk ) df (αk ) = 0, > 0 dα dα2
⇒
d(k) = ( 1,
df (x(k+1)) df (x(k+1)) dx(k+1) 0= = = dα dx dα T
·
f T (x(k+1)) d(k)
∇
c(k+1) d(k) = c (k+1) d(k) = 0
c(k) =
⇒
c(k) d(k) =
·
T c(k+1)
x(k+1) =
c(k+1), is orthogonal to the current search direction, d(k)
⇒ Gradient of the cost function at NEW point,
10
df 10 NC: = 14αk 20 = 0 αk = dα 7 2 d f = 14 > 0 dα2 1 1 3/7 + ( 10 ) = x(k+1) = 7 2 1 4/7
−
f (x(k+1)) =
∇f (x(k+1))
=
∇f T (x(k+1))d(k)
=
⇒
− − − − − − 54 < 22 = f (x(k)) 7 10 7 10 7
10 7
10 7
1
−1
= 0 (check)
at x(k) = (1, 2)
− − −− − − − 10 10 1 2
+ α
6x1 + 2x2
=
2x1 + 4x2
1 1
1 1
=
=
=
x(k)
10 10
20 < 0
1
α
2
α
f (x(k+1)) = 3 ( 1 =
Chen CL
− −1) ∇f (x(k))
− α)2 + 2(1 − α)(2 − α) + 2(2 − α)2 + 7 7α2 − 20α + 22 ≡ f (α)
Chen CL
11
Numerical Methods to Compute Step Size Most one-dimensional search methods work for only unimodal functions (work for α = 0 α α = α ¯ u,) (αu α interval of uncertainty )
− ≡
≤ ≤
Chen CL
12
Chen CL
Unimodal Function
Unimodal Function ➢
➢
Unimodal function: f (x) is one unimodal function if
☞ ☞
x1 < x2 < x∗ implies f (x1) > f (x2), and x∗ > x3 > x4 implies f (x3) < f (x4)
Outcome of two experiments x∗
☞ ☞ ☞
Chen CL
14
Equal Interval Search ➢
To reduce successively the interval of uncertainty , I , to a small acceptable value
➢
I = α u
➢
Evaluate the function at
− α ,
(α = 0) α = 0, δ, 2δ, 3δ,
If f ((q + 1)δ ) < f (qδ )
··· , αu
then continue
If f ((q + 1)δ ) > f (qδ ) then α = (q
⇑ new pt
⇑ current pt
− 1)δ,
αu = (q + 1)δ α∗
13
∈ [ α, αu ]
∈ [0, 1], 0 < x1 < x2 < 1 f 1 < f 2 ⇒ x∗ ∈ [0, x2] f 1 > f 2 ⇒ x∗ ∈ [x1, 1] f 1 = f 2 ⇒ x∗ ∈ [x1, x2]
Chen CL
15
Chen CL
16
Chen CL
Equal Interval Search: Example Equal Interval Search: Example f (α) = 2
− 4α + eα
δ = 0.5
= 0.001
Note: f (x) = x(x 1.5), x∗
−
i
xi f (xi)
∈ [0, 1] ⇒ x∗ ∈ [x7, x8] = [0.7, 0.8]
1
2
3
4
5
6
7
8
9
.1
.2
.3
.4
.5
.6
.7
.8
.9
use 99 points
⇒ eliminate 98% ⇒ eliminate ∼ 1%
per function evaluation
−.14 −.26 −.36 −.44 −.50 −.54 −.56 −.56 −.54
Chen CL
18
Equal Interval Search: 3 Interior Points x∗
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
α
→ →
αu
α
→ →
αu
α
→ →
αu
α
→ →
αu
Trial step 0.0000 00 0.5000 00 1 .0000 00 1.5000 00 2 .0000 00 1.050000 1.100000 1.150000 1.200000 1.250000 1.300000 1 .350000 1.400000 1 .450000 1.355000 1.360000 1.365000 1.370000 1.375000 1 .380000 1.385000 1 .390000 1.380500 1.381000 1.381500 1.382000 1.382500 1.383000 1.383500 1.384000 1.384500 1.385000 1.385500 1 .386000 1.386500 1 .387000 1.3865 00
Function 3.000000 1.648721 0.718282 0.481689 1.389056 0.657651 0.604166 0.558193 0.520117 0.490343 0.469297 0.457426 0.455200 0.463115 0.456761 0.456193 0.455723 0.455351 0.455077 0.454902 0.454826 0.454850 0.454890 0.454879 0.454868 0.454859 0.454851 0.454844 0.454838 0.454833 0.454829 0.454826 0.454824 0.454823 0.454823 0.454824 0.454823
17 δ = 0.5
start from α = 1.0 δ = 0.05
start from α = 1.35 δ = 0.005
start from α = 1.38 δ = 0.0005
Chen CL
19
Equal Interval Search: 2 Interior Points
∈ [a, b] three tests x1, x0, x2 ⇒ three possibilities
αa = α + 31 (αu αb = α +
2 3 (αu
Case 1: f (αa) < f (αb) Case 2: f (αa) > f (αb) I = 32 I 2 points 3 points (2 are new)
eliminate 25% per function evaluation !!
Why ?
:
⇒ ⇒
− α ) = 31 (α + 2α ) − α ) = 31 (2α + α )
u
u
α < α∗ < αb αa < α∗ < αu
reduced interval of uncertainty
⇒ eliminate 16.7% per function evaluation ?! “old” point is NOT used
Chen CL
20
Chen CL
21
Golden Section Search
Golden Section Search Reduction of Interval of Uncertainty
➢
Question of Equal Interval Search (n = 2): known midpoint is NOT used in next iteration
➢
Solution: Golden Section Search
➢
Fibonacci Sequence: F 0 = 1;
F 1 = 1;
F n = F n−1 + F n−2, n = 2, 3,
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, F n F n−1
→
1.618,
F n−1 F n
→
0.618 as n
···
Given αu, α Select αa, αb s.t. αu αb Suppose f (αb) > f (αa) α = α , αb
→∞
Chen CL
⇒
22
− − ⇒ −
I = αa = α = α∗ αb = α =
∈
αu τI,
−α
αa
α = (1 τ )I τI, αu αb = (1 τ )I [αb, αu], delete [αb, αu] αa, αu = αb, I = αu α, τ I , αu αb = (1 τ )I
− −
− −
−
−
Chen CL
Golden Section Search
Reduction of Interval of Uncertainty
Initial Bracketing of Minimum
I = τ I, (1 τ )I = τ I = τ (τ I ) τ 2 + τ 1 = 0
−
−√
τ = − 1+2 5 = 0.618 = 1 τ = 0.382
1 1.618
−
Q : fine initial three points ? αq−1
αq−2
− − αa αu αq
αu αa
− −
α = 0.382I, αa = 0.618I = (1.618) 0.382I αq−1 − αq−2 αq−1 αa αq αq−1 0.618I = = = 1.618 α αq−1 αq−2 0.382I
− −
−
23
Golden Section Search
⇒ ⇒
⇒
···
Starting at α = 0, q evaluate αq = δ (1.618)j = αq−1 + (1.618)q δ,
j =0
q = 0, 1, 2,
· ··
q = 0; α0 = δ α1
α0
− − −
q = 1; α1 = δ + 1.618δ 1.618(α0 − 0) α2
= 2.618δ
α1
q = 2; α2 = 2.618δ + 1.6182δ = 5.236δ 1.618(α1 − α0 ) α3
α2
q = 3; α3 = 5.236δ + 1.6183δ = 9.472δ 1.618(α2 − α1 )
Chen CL
If
24
Chen CL
25
Golden Section Search
Golden Section Search
Initial Bracketing of Minimum
Algorithm
f (αq−2) > f (αq−1) and f (αq−1) < f (αq )
Then αq−2 < α∗ < αq
➢
Step 1: choose δ
➢
Step 2: αa = α + 0.382I, αb = α + 0.618I, f (αa), f (αb)
➢
Step 3: compare f (αa), f (αb), go to Step 4, 5, or 6
⇒ q, α = α −2, α
u = α q , I
q
q
αu = αq
=
δ (1.618)j
➢
α = αq−2 = I = αu
➢
δ (1.618)j
Step 5: if f (αa) > f (αb) αa < α∗ < αu α = α a, αu = αu, αa = α b, αb = α + 0.618(αu
⇒
⇒
j =0
α = (1.618)q δ + (1.618)q−1δ αq − αq−1 αq−1 − αq−2 q −1 = 2.618(1.618) δ
−
⇒
⇒
j =0 q 2
−
Step 4: if f (αa) < f (αb) α < α∗ < αb α = α , αu = α b, αb = α a, αa = α + 0.382(αu
➢
26
Golden Section Search: Example
− 4α + e
−
Chen CL
27
Golden Section Search: Example
α
δ = 0.5
No.
2
Table 5.2 Golden Section Search for f (α) = 2 4α + eα
−
3 4
Initial Bracketing of Minimum Trial step 1 2 3 4
α1 αq αu
→ → →
0.000 0.500 1.309 2.618
000 000 071 034
Function value 3.000 1.648 0.466 5.236
000 721 464 610
− 4α + eα,
δ = 0.5,
= 0.001
Reducing Interval of Uncertainty 1
= 0.001
Step 7: if I = α u α < α +α α∗ = u 2 and Stop; otherwise return to Step 3
f (α) = 2 f (α) = 2
− α ), go to Step 7
⇒
⇒
Chen CL
Step 6: if f (αa) = f (αb) αa < α∗ < αb α = α a, αu = αb, return to Step 2
⇒
➢
− α ), go to Step 7
5 6 7 8
αl ; [f (αl )]
αa ; [f (αa )]
αb ; [f (αb )]
αu ; [f (αu)]
I
0.500 000
1.309 017
1.809 017
2.618 034
2.118 034
[1.648 721]
[0.466 464]
[0.868 376]
[5.236 610]
0.500 000
1.000 000
1.309 017
1.809 017
[1.648 721]
[0.718 282]
[0.466 464]
[0.868 376]
1.000 000
1.309 017
1.500 000
1.809 017
[0.718 282]
[0.466 464]
[0.481 689]
[0.868 376]
1.000 000
1.190 983
1.309 017
1.500 000
[0.718 282]
[0.526 382]
[0.466 464]
[0.481 689]
1.190 983
1.309 017
1.381 966
1.500 000
[0.526 382]
[0.466 464]
[0.454 860]
[0.481 689]
1.309 017
1.381 966
1.427 051
1.500 000
[0.466 464]
[0.454 860]
[0.458 190]
[0.481 689]
1.309 017
1.354 102
1.381 966
1.427 051
[0.466 464]
[0.456 873]
[0.454 860]
[0.458 190]
1.354 102 [0
3]
1.381 966 [0
0]
1.399 187 [0
6]
1.427 051 [0
0]
1.309 017 0.809 017 0.500 000 0.309 017 0.190 983 0.118 304 0.072 949
Chen CL
32
Polynomial Interpolation
Chen CL ➢
Quadratic Curve Fitting
33
Computational Algorithm:
Step 1: locate initial interval of uncertainty (α, αu) Step 2: select α < αi < αu f (αi) ☞ Step 3: compute a 0 , a1 , a2 , ¯ α, f (¯ α) ☞ Step 4: ☞
⇒
☞
q (α) = a0 + a1α + a2α2
(approximated quadratic function)
f (α) = q (α) = a0 + a1α + a2α2 f (αi) = q (αi) = a0 + a1αi + a2α2i f (αu) = q (αu) = a0 + a1αu + a2α2u
⇒
1
f (αu) f (α) f (αi) = αu αi αu α αi f (αi) f (α) = a2(αi + α) αi α = f (α) a1α a2α2
a2
−
a1 a0 dq (α) dα
⇒
− − −
− − − −
−
− f (α ) −α
☞
= a1 + 2a2α ¯ = 0 α ¯ a1 d2 q α ¯ = if dα 2 = 2a2 > 0 2a2
−
Chen CL ➢
34
− 4α + e
α
f (αi) > f (¯ α)
¯ αi < α
∈ [¯α, αu] α∗
α∗
α ¯ < αi
∈ [α, ¯α] α∗
∈ [αi, αu] α∗
⇒
α, αi, ¯ α
⇒
α ¯ , αi , αu
⇒
⇒
∈ [α , α ]
i
αi, ¯ α, αu α, ¯ α, αi
Step 5: Stop if two successive estimates of minimum point of f (α) are sufficiently close. Otherwise delete primes on α , αi, αu and return to Step 2
Chen CL
35
Multi-Dimensional Minimization: Powell’s Conjugate Directions Method
Example: f (α) = 2
f (αi) < f (¯ α)
δ = 0.5
α = 0.5 αi = 1.309017 αu = 2.618034 f (α) = 1.648721 f (αi) = 0.466464 f (αu) = 5.236610 a2 = a1
➢
1 3.5879 1.1823 1.30902 2.1180 0.80902 = 2.410 1.1823 (2.41)(1.80902) = 5.821 0.80902
= −
−
a0 = 1.648271
−−
×
−
− (−5.821)(0.50) − 2.41(0.25) = 3.957
α ¯ = 1.2077 < αi f (¯ α) = 0.5149 > f (αi)
⇒
f (α) = 0.5149 f (αi) = 0.466464 f (αu) = 5.236610 a2 = 5.3807 a1 = α ¯
1 3464 f (α ¯)
−7.30547 0 4579
a0 = 2.713
{ }
S T i AS j = 0 for i, j = 1,
α = α ¯ = 1.2077 αu = αu = 2.618034, αi = α i = 1.309017 α = 1.2077 αi = 1.309017 αu = 2.618034
Conjugate Directions Let A be an n n symmetric matrix. A set of n vectors (directions) S i is said to be A-conjugate if
➢
··· , n;
i = j
Note: orthogonal directions are a special case of conjugate directions (A = I )
Chen CL
36
Chen CL
Multi-Dimensional Minimization: Powell’s Conjugate Directions Method ➢
37
Multi-Dimensional Minimization: Powell’s Conjugate Directions Method
Quadratically Convergent Method If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method
➢
Proof: 1 T X AX + B T X + C 2 Q(X ) = AX + B (n 1) Q(X ) =
∇
search from a along S search from b along S
➢
Theorem: Given a quadratic function of n variables and two parallel hyperplanes 1 and 2 of dimensions k < n. Let the constrained stationary points of the quadratic function in the hyperplanes be X 1 and X 2, respectively. Then the line joining X 1 and X 2 is conjugate to any line parallel to the hyperplanes.
Chen CL
38
Multi-Dimensional Minimization: Powell’s Conjugate Directions Method ➢
Meaning: If X 1 and X 2 are the minima of Q obtained by searching along the direction S from two different starting points X a and X b, respectively, the line (X 1 X 2) will be conjugate to S
−
⇒
⇒
S T
⇒ ⇒
S orthogonal to
X 1
×
(stationary pt)
X 2
∇Q(X 1) and ∇Q(X 2)
T
= S T AX 1 + S T B = 0
T
= S T AX 2 + S T B = 0
∇Q(X 1) S ∇Q(X 2) [∇Q(X 1) − ∇Q(X 2)] S
= S T A(X 1
− X 2)
= 0
Chen CL
39
Multi-Dimensional Minimization: Powell’s Conjugate Directions Method Theorem:
If a quadratic function 1 Q(X ) = X T AX + B T X + C 2
Proof:
∇Q(X ∗)
β j S j
j =1
S j
:
conjugate directions to A
n
⇒
of a set of n mutually
0
= B + AX 1 + A
β j S j
j =1
0
conjugate directions, the
= S T i ( B + AX 1 )+ n
S T i A
minimum of the function Q
β j S j
j =1
will be found at or before the nth step irrespective of
n
Let X ∗ = X 1 +
is minimized sequentially, once along each direction
= B + AX ∗ = 0
⇒
β
= (B + AX 1)T S i + β iS T i AS i (B + AX 1)T S i
Chen CL
40
Multi-Dimensional Minimization: Powell’s Conjugate Directions Method Note: X +1 = X + λ∗S , i = 1, ·· · , n i
i
i
Chen CL
41
Powell’s Conjugate Directions: Example f (x1, x2) = 6x21 + 2x22
i
=
λ∗i is found by minimizing Q(λiS i) so that 0 = S T Q(X i+1) i Q(X i+1) = B + AX i+1 = B + A(X i + λ∗i S i) ∗ 0 = S T Q(X i+1) = S T i i B + A(X i + λi S i )
∇
∇
⇒
⇒
∇
{
}
= (B + AX i) S i + λ∗i S T i S i λ∗i =
X i
−
T
if S 1 =
⇒
(B + AX i)T S i
S T 1 AS 2
=
S T i AS i i 1
= X 1 +
−
=
λ∗j S j
j =1
T
X i AS i
T
= X 1 AS i +
λ∗1 =
i 1
−
λ∗j S j T AS i = X 1T AS i
j =1
⇒
λ∗i = =
i
S i S T i AS i S i T S T AS i i
T
−(B + AX ) −(B + AX 1)
⇒
= β i
Chen CL
42
Powell’s Conjugate Directions: Example
λ∗2 =
−1 −21 − −0 12 6 1 1 0 −6
⇒
X 3
= X 2 + λ∗2 S 2 =
4
=
1 12
0 5/4 1 1 + 12 0 5/2
=
4/3 5/2
= X ∗ (?)
X 2
6x1x2 x1 2x2 x1 12 6 x1 1 + x1 x2 2 2 x2 6 4 x2 0 X 1 = 0 12 6 s1 6 4 s2 s1 1 S 2 = = 0 s2 0 1 1 2 2 5 = 4 12 6 1
− − − − − − − − − ⇒ 1
1 2
1 2 0 2
− − − − 1 2 −6 4 2
= X 1 + λ∗1 S 1 =
0 5 1 + 4 2 0
Chen CL
=
5/4 5/2
43
Powell’s Algorithm
Chen CL
44
Chen CL
Progress of Powell’s Method
un; u1,
u2,
(1)
S ; u2,
· ··
S
− ;
un−1, un,
S (1);
un−1, un,
S (1),
S (2);
un, S (1),
S (2),
··· ,
S (n−1)
(un, S (1)), (S (1), S (2)), un, S (1), S (2),
⇒
are A-conjugate
···
Powell’s Conjugate Directions: Example
un;
··· , ··· ,
S (2); u3, (n 1)
· ·· ,
Min: f (x1, x2) = x1
46
Cycle 1: Univariate Search
−u1: ⇒
⇒
λ∗ =
X 2 = X 1 + λ∗u2
⇒ along
= 0
f (X 2
− λu1) df dλ
−λ
1 2
=
S (1) = X 4
0
− X 2
=
=
0.5
=
λ∗ =
X 4 = X 1 + λ∗u2
1 2
=
− λ − 0.75
− 0.5
− − − 0.5 1
0
0.5
=
0.5
0.5
f (X 4 + λS ) = f ( 0.5
− − 2λ − 0.25 1 ∗ 0 ⇒ λ = 2 −0.5 X 2 − λ∗u1 = − 0 ⇒
=
(1)
0.5
along u2: f (X 3 + λu2) = f ( 0.5, 0.5 + λ) = λ2
⇒
47
Cycle 2: Pattern Search
= f ( λ, 0.5) = 2λ2
X 3 =
df dλ
Chen CL ➢
along u2: f (X 1 + λu2) = f (0, λ) = λ2 df dλ
− x2 + 2x21 + 2x1x2 + x22
X 1 = [ 0 0 ]T
···
Chen CL ➢
45
df dλ
⇒
=
X 5 =
− − 0.5λ, 1 + 0.5λ) 0.25λ2 − 0.5λ − 1 0 ⇒ λ∗ = 1.0 −1.0 X + λ∗S (1) = 4
1.5
Chen CL
48
Chen CL
Simplex Method
Chen CL
Simplex Method
50
Simplex Method
49
Chen CL
51
Simplex Method
Chen CL
52
Chen CL
53
Properties of Gradient Vector
Properties of Gradient Vector ➢
··· ···
f (x) = f (x1, x2, . . . , xn)
∇f (x) c
(k)
=
∂f (x) ∂x 1
..
∂f (x) ∂x n
= c
(k)
= c(x ) =
Property 1: The gradient vector c of a function f (x1, , xn) at point x∗ = (x∗1, , x∗n) is orthogonal (normal) to the tangent plane for the surface f (x1, , xn) = constant.
(k)
∇f (x
) =
∂f (x(k)) ∂x i
☞ ☞
Chen CL
54
C is any curve on the surface through x∗ T is a vector tangent to curve C at x∗ c T = 0
Proof: s : any parameter along C
··· ∂x 1 ∂s
T =
55
Properties of Gradient Vector ➢
Property 2: Gradient represents a direction of maximum rate of increase for f (x) at x∗
➢
Proof: u
∂x n ∂s
⇒
⇒ ·
Chen CL
Properties of Gradient Vector ➢
x=x∗ (a unit tangent vector along C at x∗) df f (x) = constant = 0 ds df ∂f ∂x1 ∂f ∂xn 0 = = + + ds ∂x1 ∂s ∂xn ∂s T = c T = c T
⇒ ··· ·
···
:
a unit vector in any direction not tangent to C
t : a parameter along u df f (x + u) f (x) = lim 0 → dt ∂f ∂f f (x + u) = f (x) + u1 ∂x + + un ∂x + O(2) n 1
−
f (x + u)
−
f (x) =
···
n
ui
i=1
∂f + O(2) ∂x i
df f (x + u) = lim = →0 dt =
||c|| ||u|| cos θ
n
i=1
ui
(
×1 )
∂f = c u = cT u ∂x i
·
(max rate of increase when θ = 0)
Chen CL
56
Chen CL
Properties of Gradient Vector ➢
➢
Verify Properties of Gradient Vector
Property 3: The maximum rate of change of f (x) at any point x∗ is the magnitude of the gradient vector
df dt
57
f (x) = 25x21 + x22, f (x(0)) = 25 c
(max = c ) u is in the direction of gradient vector for θ = 0
|| ||
C = t
T
Chen CL
58
·
➢
Slope of gradient: m2 = ➢
dx2 dx1 c1 c2
= =
√ −15−
x1
50x1 2x2
=
x2 1
=
−3 175 .
30 8
=
f (0.6, 4) =
c c
=
30 8
302+82
2 ∂ (25x2 1+x2 =25) ∂s 1 2 ∂ (25x2 1+x2 =25) ∂s 2
t
||t||
=
∂f ∂x 1 ∂f ∂x 2
50x1 2x2
=
0.966235 0.257663
=
4 15
4 15
( 4)2 +152
−
=
=
=
30 8
.257663 0.966235
59
Steepest Descent Algorithm
Property 1: C T = 0 Slope of tangent: m1 =
=
∇ √ || || − − − √
Chen CL
Verify Properties of Gradient Vector ➢
=
x(0) = (0.6, 4)
= 3.75
Property 2: choose arbitrary direction D = (0.501034, 0.865430), α = 0.1
➢
Steepest Descent Direction Let f (x) be a differentiable function w.r.t. x. The direction of steepest descent for f (x) at any point is d = c
−
Steepest Descent Algorithm:
Step 1: a starting design x(0), k = 0, (k) ☞ Step 2: c = f (x(k)); stop if c(k) < (k ) ☞ Step 3: d = c(k) (k) ☞ Step 4: calculate αk to minimize f (x + αd(k)) (k+1) ☞ Step 5: x = x (k) + αkd(k), k = k + 1 Step 2 ☞
x(1) C
= x(0) + αC =
x(1)
= x(0) + αD =
D
f (x(1))
= 28.3389
f (x(1))
= 27.2566
C D
<
0.6 0.966235 + 0.1 4.0 0.257663 0.6 0.501034 + 0.1 4.0 0.865430
f (x(1)) C
= =
0.6966235 4.0257663 0.6501034 4.0854300
∇ −
|| ||
⇒
Chen CL
➢
60
☞
− ⇒⇒ ·
−|| ||
c = = c(k) c(k+1) = 0
−
d
➢
Steepest Descent: Example x(0) = (1, 0) f (x1, x2) = x21 + x22 − 2x1x2
d = c c d = c 2 < 0 The successive directions of steepest descent are normal to each other
⇒
d(k) d(k+1)
·
➢
Step 1: x(0) = (1, 0), k = 0, ()
➢
Step 2: c(0) =
·
➢
proof:
➢
df (x(k+1)) dα (k+1)
0 = =
61
Steepest Descent Algorithm
Notes: ☞
Chen CL
T
∂f (x ∂ x
c(k+1)
)
T
x(k)+αd
∂ (
(k)
df α dα d2f (α) dα2
)
∂α
T
= c(k+1) d(k) = d(k+1) d(k) =
−
∂ x(k+1) ∂α
∇f (x(0)) = (2x 1 − 2x 2 − 2x1 ) √ 2, 2x = (2, −2); ||c(0)|| = 2 2 =0 Step 3: d(0) = −c(0) = (−2, 2) Step 4: to minimize f (x(0) + αd(0)) = f (1 − 2α, 2α) f (1 − 2α, 2α) = (1 − 2α)2 + (2α)2 − 2(1 − 2α)(2α) = 16α2 − 8α = f (α) ( ) = 32α − 8 = 0 ⇒ α0 = 0.25
·
− c( +1) · c( ) k
k
➢
Step 5: x(1) = x (0) + α0d(0) = (1 c(1) = (0, 0) stop
⇒
Chen CL
62
63
Steepest Descent: Example Optimum solution for Example 5.10 with steepest descent program NO.
1
2.00000E + 00
x∗ = (0, 0, 0)
2 3
Step 1: k = 0, = 0.005, (δ = 0.05, ε = 0.0001 for Golden)
5 6 7
➢
Step 2: c(0) =
∇f (x(0)) = (2x1(0) + 2x2,√ 4x2 + 2x1 + 2x3, 4x3 + 2x2) = (12, 40, 48); ||c || = 4048 = 63.6 >
➢
Step 3: d(0) =
➢
Step 4: to minimize f (x(0) + αd(0)) by Golden
➢ Step 5:
x1
x(0) = (2, 4, 10)
4 ➢
8 9 10 11
−c(0) = (−12, −40, −48)
12 13
⇒ α0 = 0.158718
14 15 16
x(1) = x (0) + α0d(0) = (0.0954, 2.348, 2.381) c(1) = ( 4.5, 4.438, 4.828); c(1) = 7.952 >
− −
−
|| ||
− 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)
Chen CL
Steepest Descent: Example f (x1, x2, x3) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
= 32 > 0
17 18
x2
4.00000E + 00
x3
f (x)
α
6.36239E + 01
9.53870E
2.38155E + 00 1.07503E + 01
7.95922E + 00
1.47384E + 00
9.04562E
− 02 −2.34871E + 00 −9.90342E − 01 1.29826E + 00 −1.13477E + 00 1.08573E + 00 −6.61514E − 01 9.24028E − 01 −7.63036E − 01 7.34684E − 01 −4.92294E − 01 6.40697E − 01 −5.48401E − 01 5.35008E − 01 −3.46139E − 01 4.61478E − 01 −3.89014E − 01 3.78902E − 01 −2.49307E − 01 3.28464E − 01 −2.78695E − 01 2.71519E − 01 −1.77281E − 01 2.34872E − 01 −1.98704E − 01 1.93449E − 01 −1.26669E − 01 1.67477E − 01 −1.41872E − 01 1.38196E − 01 −9.03973E − 02 1.19599E − 01 −1.01263E − 01
− 01 6.07228E − 01 5.03640E − 01 3.71842E − 01 3.94601E − 01 2.79474E − 01 2.67396E − 01 1.93950E − 01 1.95219E − 01 1.40291E − 01 1.38132E − 01 9.96396E − 02 9.89806E − 02 7.12541E − 02 7.05267E − 02 5.08180E − 02
1.05936E + 00 6.73757E
− 01 4.58534E − 01 3.17218E − 01 2.24007E − 01 1.58946E − 01 1.13373E − 01 8.09174E − 02 5.78310E − 02 4.13141E − 02 2.94940E − 02 2.10499E − 02 1.50233E − 02 1.07197E − 02 7.65362E − 03 5.46341E − 03
− 01 3.05872E − 01 1.81571E − 01 6.49989E − 01 1.90588E − 01 5.88053E − 01 1.93877E − 01 5.71554E − 01 1.94660E − 01 5.69767E − 01 1.94601E − 01 5.72088E − 01 1.94439E − 01 5.72650E − 01 1.94457E − 01 5.71777E − 01 1.94534E − 01 5.70832E − 01
c
1.00000E + 01 3.32000E + 02 1.58718E
2.06142E + 00 8.13910E
− 01
1.21729E + 00 5.63154E
− 01 − 01 3.99141E − 01 5.77545E − 01 2.84837E − 01 4.11895E − 01 2.03339E − 01 2.94710E − 01 1.45112E − 01 2.10430E − 01 1.03580E − 01 1.50077E − 01 7.39559E − 02 8.19377E
Chen CL
64
Chen CL
Steepest Descent: Example NO.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
x1
9.86659E
x2
− 02 8.54114E − 02 7.04412E − 02 6.09761E − 02 5.02296E − 02 4.34774E − 02 3.58170E − 02 3.09971E − 02 2.55704E − 02 2.21338E − 02 1.82492E − 02 1.57951E − 02 1.30274E − 02 1.12762E − 02 9.29691E − 03
−6.46051E − 02 −7.23289E − 02 −4.60867E − 02 −5.16211E − 02 −3.28470E − 02 −3.68093E − 02 −2.34188E − 02 −2.62487E − 02 −1.67348E − 02 −1.87414E − 02 −1.19396E − 02 −1.33752E − 02 −8.52432E − 03 −9.54793E − 03 −6.08244E − 03
f (x)
x3
5.03927E
− 02 3.63134E − 02 3.59726E − 02 2.59230E − 02 2.56647E − 02 1.84853E − 02 1.83000E − 02 1.31757E − 02 1.30584E − 02 9.40925E − 03 9.32100E − 03 6.71412E − 03 6.65348E − 03 4.79362E − 03 4.74860E − 03
3.90156E
− 03 2.78693E − 03 1.98946E − 03 1.41991E − 03 1.01241E − 03 7.21938E − 04 5.14807E − 04 3.67049E − 04 2.62104E − 04 1.87131E − 04 1.33549E − 04 9.53054E − 05 6.80463E − 05 4.85693E − 05 3.46611E − 05
α
1.94571E
− 01 5.72147E − 01 1.94320E − 01 5.74372E − 01 1.94224E − 01 5.74409E − 01 1.94379E − 01 5.71430E − 01 1.94475E − 01 5.72554E − 01 1.94475E − 01 5.71873E − 01 1.94439E − 01 5.72520E − 01 1.94283E − 01
65
Steepest Descent: Disadvantages c
1.07016E
− 01 − 02 7.65397E − 02 3.76650E − 02 5.46915E − 02 2.68589E − 02 3.90103E − 02 1.91685E − 02 2.77637E − 02 1.36815E − 02 1.98348E − 02 9.76661E − 03 1.41534E − 02 6.97001E − 03 1.01055E − 02 5.28078E
➢
Slow to converge, especially when approaching the optimum
⇒ a large number of iterations ➢
Information calculated at previous iterations is NOT used, each iteration is started independent of others
Optimum design variables : 8.04787E
− 03 , −6.81319E − 03 , 3 .42174E − 03 − 05 Norm of gradient at optimum : 4.97071E − 03 Optimum cost function value : 2.47347E
Total no. of function evaluations : 753
Chen CL
66
Chen CL
67
Scaling of Design Variables ➢
Example: Min: f (x1, x2) = 25x21 + x22 H =
➢
The steepest descent method converges in only one iteration for a positive definite quadratic function with a unit condition number of the Hessian matrix
let x
50 0 0 2
= Dy
⇓
D =
Min: f (y1, y2) = 21 y12 + y22 y 0 = ( 50, 2)
√ √
➢
To accelerate the rate of convergence scale design variables such that
⇒
x0 = (1, 1)
√ 150
0
0
√ 12
Chen CL
68
Chen CL ➢
69
Example: Min: f (x1, x2) = 6x21 H =
12
−6
− 6x1x2 + 2x22 − 5x1 + 4x2 + 2 −6 4
λ1,2 = 0.7889, 15.211 (eigenvalues) v 1,2 = (0.4718, 0.8817), ( 0.8817, 0.4718)
let x = Qy
− −
Q = v 1 v2 =
0.4718 0.8817
0.8817 0.4718
Min: f (y1, y2) = 0.5(0.7889y12 + 15.211y22) + 1.1678y1 + 6.2957y2 + 2 let y = Dz
D =
√ 0.17889
0
0
√ 151.211
Min: f (y1, y2) = 0.5(z12 + z22) + 1.3148z1 + 1.6142z2
− −2) ⇒ z∗ = (−1.3158, −1.6142) QDz ∗ = (− 13 , − 23 )
x0 = ( 1, x∗ =
Chen CL
70
Conjugate Gradient Method Fletcher and Reeves (1964) ➢
Steepest Descent: orthogonal at consecutive steps converge but slow
⇒ ➢
Conjugate Gradient Method: modify current steepest descent direction by adding a scaled previous direction
⇒ cut diagonally through orthogonal steepest descent directions ➢
Conjugate Gradient Directions: d(i), d( j) orthogonal w.r.t. a symmetric and positive definite matrix A (i) T
d
( j)
Ad
0
Chen CL ➢
71
Conjugate Gradient Method: algorithm
Step 1: k = 0, x(0) d(0) = c(0) = f (x(0)) (0) Stop if c < , otherwise go to Step 4 (k) (k ) ☞ Step 2: c = f (x ), Stop if c(k) < 2 (k ) ☞ Step 3: d = c(k)+β k d(k−1), β k = c(k) / c(k−1) (k) ☞ Step 4: compute α k = α to minimize f (x + αd(k)) (k) (k+1) (k) ☞ Step 5: x = x + αkd , k = k + 1, go to Step 2
⇒ || || ∇ −
☞
➢
−
−∇ || || || || ||
||
Note:
Find the minimum in n iterations for positive definite quadratic forms having n design variables ☞ Inexact line search, non-quadratic forms re-started every n + 1 iterations for computational stability (n+1) ( (0) ) ☞
⇒
Chen CL ➢
72
Chen CL
73
Example:
−
0.0956
(2)
x
(1)
= x
(1)
+ αd
=
+ αd
(2)
=
(2)
=
Min: f (x Min: f (x) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3 c(0) = (12, 40, 48); x(1) = (0.0956, c(1) =
β 1 = d(1) =
=
||c(0)|| = 63.6;
⇒
x(0) = (2, 4, 10)
f (x(0)) = 332.0
74
Newton Method A Second-order Method x : current estimate of x∗ x∗
≈
x + ∆x (desired) T
f (x + ∆x) = f (x) + c ∆x +
NC:
⇒ ⇒
∂f ∂ ∆x
1 ∆xT H ∆x 2
= c + H ∆x = 0
∆x = ∆x =
−H −1c −αH −1c (modified)
⇒ α = 0.3156 (1.4566, −1.1447, 0.6205) (0.6238, −0.4246, 0.1926), c(2) · d(1) = 0
3.81268
5.57838
||c(2)|| = 0.7788
Chen CL
⇒
c
Note:
−2.348, 2.381) (−4.5, −4.438, 4.828); ||c(1)|| = 7.952; f (x(1)) = 10.75 ||c(1)||/||c(0)|| 2 = [7.952/63.3]2 = 0.015633 −c(1) + β 1d(0) −12 4.500 4.31241 4.438 + (0.015633) −40 = 3.81268 −48 −5.57838 −4.828
x
)
−
4.31241
2.348 + α
2.381
(1)
(1)
Chen CL ➢
75
Steps: (modified)
Step 1: k = 0; c(0); (k ) x(k)) ☞ Step 2: ci = ∂f (∂x , i = 1 i ☞
☞
Step 3: H (x(k)) =
☞
Step 4: d(k) =
∼
n; Stop if c(k) <
|| ||
2
∂ f ∂x i∂x j 1 (k )
−H − c
or
Hd(k) =
−c( ) k
Note:for computational efficiency, a system of linear simultaneous eqns is solved instead of evaluating the inverse of Hessian
Step 5: compute α k = α to minimize f (x(k) + αd(k)) (k+1) ☞ Step 6: x = x (k) + αd(k), k = k + 1, go to Step 2 ☞
➢
Note: unless H is positive definite, d(k) will not be that of descent for f H > 0
⇔
T
c(k) d(k) =
−αk c(k)T H −1c(k) < 0
0 for positive H
Chen CL ➢
76
f (x) = (0)
c
(0)
d
= 0.0001
= (6x1 + 2x2, 2x1 + 4x2) = ( 5 0, 50);
H (0) =
6 2 2 4
,
H (0)
−1
∇f (x(1)) ∇f (x(1)) · d(0)
= 0 or = = =
6(5 2(5
− 201 −42 −62
||c(0)|| = 50√ 2
∇f (x(1))
5 10
4 2
2 6
+ α
5 10
− 5α − 10α
=
0 0
50 50
− 50α − 50α
=
0 0
d(0) = 0
− 5α) + 2(10 10α) = − 5α) + 4(10 10α) 5 50 − 50α 50 50α −10
50 50
50α 50α
−5(50 − 50α) − 10(50 − 50α) = 0 ⇒
α = 1 78
Example: f (x) = 1 0x41 H =
50 5 = 50 10 5 5 5α = 10 10 10α
Chen CL
c =
c(1) =
− − − − −− − − · − − − − − −
1 = −H −1c(0) = − 20
x(1) = x(0) + αd(0) = df dα
=
77
x(1) =
3x21 + 2x1x2 + 2x22 + 7
x(0) = (5, 10);
➢
Chen CL
Example:
− 20x21x2 + 10x22 + x21 − 2x1 + 5, x(0) = (−1, 3) ∇f (x) = ( 40x31 − 40x1x2 + 2x1 −2, −20x21 + 20x2) ∇2f (x) = 120x −−40x40x + 2 −40x 20 2 1
2
1
1
Chen CL
79
Chen CL ➢
80
Comparison of Steepest Descent, Conjugate Gradient Methods
f (x) = 50(x2
Chen CL
− x21)2 + (2 − x1)2
x(0) = (5,
−5)
Chen CL
81
Chen CL
83
Newton,
x∗ = (2, 4
82
Newton Method ➢
Advantage: quadratic convergent rate
➢
Disadvantages:
Calculation of second-order derivatives at each iteration A system of simultaneous linear equations needs to be solved ☞ Hessian of the function may be singular at some iterations ☞ Memoryless method: each iteration is started afresh ☞ Not convergent unless Hessian remains positive definite and a step size determination scheme is used ☞
Chen CL
84
Chen CL
Quasi-Newton Methods
Marquardt Modification (1963) ➢
d(k) =
➢
➢
−(H + λI )−1c(k) Far away solution point ⇒ use Steepest Descent Near the solution point ⇒ use Newton Method
☞
➢
Step 1: k = 0; c(0); ; λ (= 10000) (large) (k) x(k)) ☞ Step 2: ci = ∂f (∂x , i = 1 n; Stop if c(k) < i ☞
Step 3: H (x(k) =
∂ 2f ∂x i ∂x j
)
−
Chen CL
⇒
Use 2nd-order derivatives
⇒ quadratic convergence rate
Requires calculation of n(n2+1) 2nd-order derivatives ! ☞ DIfficulties if Hessian is singular ☞ Not learning processes ☞
86
Chen CL
Quasi-Newton Methods ➢
Use only 1st-order information poor rate of convergence Each iteration is started with new design variables without using any information from previous iterations
Newton Method:
☞
|| ||
Step 4: d(k) = (H + λk I )−1c(k) (k) ☞ Step 5: if f (x + d(k)) < f (x(k)), go to Step 6 Otherwise, let λk = 2λk and go to Step 4 ☞ Step 6: Set λk +1 = 0.5λk , k = k + 1 and go to Step 2 ☞
Steepest Descent:
☞
☞
∼
85
Quasi Newton Methods, Update Methods:
Use first-order derivatives to generate approximations for Hessian combine desirable features of both steepest descent and Newton’s methods ☞ Use information from previous iterations to speed up convergence (learning processes) ☞ Several ways to approximate (updated) Hessian or its inverse ☞ Preserve properties of symmetry and positive definiteness
87
Davidon-Fletcher-Powell (DFP) Method ➢
Davidon (1959), Fletcher and Powell (1963)
➢
To approximate Hessian inverse using only first derivatives
☞
−αH −1c ≈ −αAc
∆x = A :
find A by using only 1 st-order information
Chen CL ➢
88
DFP Procedures:
A
H −1
≈
(0)
Step 1: k = 0; c(0), ; A (= I , H −1) (k) ☞ Step 2: c = f (x(k)), Stop if c(k) < (k ) ☞ Step 3: d = A(k)c(k) (k ) ☞ Step 4: compute α k = α to minimize f (x + αd(k)) (k+1) ☞ Step 5: x = x (k) + αk d(k) (k) ☞ Step 6: update A ☞
T
y
(k )
(k+1)
= c
c(k+1) = ☞
(k )
d f (x(k) + αd(k)) dα ☞
= α=0
k T
−c( )
A(k)c(k) < 0
When applied to a positive definite quadratic form, A(k) converges to inverse of Hessian of the quadratic form
(k) T
C (k) = − z(k) z (k) y ·z
(change in design)
( k)
(change in gradient)
−c
z(k) = A(k)y (k)
∇f (x( +1)) k
Step 7: set k = k + 1 and go to Step 2
Chen CL ➢
Matrix A(k) is always positive definite always converge to a local minimum if α > 0
⇒
A(k+1) = A(k) + B (k) + C (k) (k ) ( k ) B (k) = s (k)s (k) s ·y s(k) = αkd(k)
89
DFP Properties:
☞
≈ || ||
∇ −
Chen CL ➢
90
Chen CL
91
DFP Example: f (x) = 5x21 + 2x1x2 + x22 + 7
1-1.
x(0) = (1, 2); (0)
1-2. 1-3. 1-4.
A(0) = I ;
x(0) = (1, 2)
k = 0, = 0.001
c
= (10x1 + 2x2, 2x1 + 2x2) = (14, 6),
(0)
=
||c || (0)
d
=
x(1) =
y (0) =
√ 142 + 62 = 15.232 > −c(0) = (−14, −6) x(0) + αd(0) = (1 − 14α, 2 − 6α)
s(0) y (0) =
·
s(0)s(0)
f (x(1)) = f (α) = 5(1 − 14α)2 + 2(1 − 14α)(2 − 6α) + 2(2 − 6α)2 + 7 df dα
= 5(2)(−14)(1 − 14α) + 2(−14)(2 − 6α) + 2(−6)(1 − 14α) +2( 6)(2
−
− 6α) = 0
d2f = dα2
α = 0.0988,
1-5.
(1)
(0)
+
(0)
d
(1
2348 > 0 14
2
s(0) = α0d(0) = ( 1.386,
1-6.
T
=
B (0) =
−0.593), c(1) = (−1.406, 2.042) c(1) − c(0) = (−15.046, −3.958), z (0) = y (0) y(0) · z (0) = 242.05 23.20, −
1.921 0.822
T
z (0)z (0) =
0.822 0.352
0.0828 0.0354 0.0354 0.0152
A(1) = A(0) + B (0) + C (0) =
6 )
( 0 386 1 407)
C (0) =
−
0.148
− −
0 211
226.40 59.55 59.55
0.935 0.246
−0.246 −0.065
−0.211 0 950
15.67
Chen CL
92
Chen CL
93
s(1) = α1d(1) = (0.455,
2-6.
y (1) = z (1) = s(1) y (1) =
·
s(1)s(1)
2-2. 2-3.
(1)
||c ||
2-4.
(1)
d
= =
T
=
B (1) =
√ 1.0462 + 2.0422 = 2.29 > −A(1)c(1) = (0.586, −1.719)
A(2) =
x(2) = x(1) + αd(1)
α1 = 0.776 (minimize f (x(1) + αd(1)))
2-5.
x(2) = x(1) + αd(1) = ( 0.386, 1.407) + (0.455,
= (0.069, 0.073)
−
94
Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Direct update Hessian using only first derivatives
−αH −1c = −αc ≈ −αc
∆x = H ∆x A∆x
A :
T
−1.334)
Chen CL
➢
−1.334), c(2) = (0.836, 0.284) c(2) − c(1) = (1.882, −1.758) A(1)y (1) = (0.649, −2.067) y(1) · z (1) = 4.855 3.201, 0.207 −0.607 0.421 −1.341 z(1)z (1) = −0.607 1.780 −1.341 4.272 −.0867 0.276 .0647 −0.19 C (1) = −0.19 0.556 0.276 −0.880 0.126 −0.125 A(1) + B (1) + C (1) = −0.125 0.626
Chen CL ➢
95
BFGS Procedures:
Step 1: k = 0; c(0), ; H (0)(= I , H ) (k) ☞ Step 2: c = f (x(k)), Stop if c(k) < (k) (k) d = c(k) to obtain d(k) ☞ Step 3: solve H (k) ☞ Step 4: compute α k = α to minimize f (x + αd(k)) ( ) k (k+1) ☞ Step 5: x = x (k) + αkd (k ) ☞ Step 6: update H
≈
☞
∇
|| ||
−
H (k+1) = H (k) + D (k) + E (k) T
find A by using only 1 st-order information
s(k)
y (k) = c(k+1) (k+1)
c ☞
T
y (k)y (k) y (k)·s(k) = αk d(k)
D (k ) =
=
(change in design)
− c( )
(k+1)
∇f (x
(k ) (k ) E (k) = c c (k) c(k)·d
k
(change in gradient)
)
Step 7: set k = k + 1 and go to Step 2
Chen CL ➢
96
Chen CL
97
BFGS Example: f (x) = 5x21 + 2x1x2 + x22 + 7 H (0) = I ;
x(0) = (1, 2);
1-1.
x(0) = (1, 2)
k = 0, = 0.001
c(0) = (10x1 + 2x2, 2x1 + 2x2) = (14, 6), (0)
1-2.
||c || (0)
1-3.
d
= =
x(1) =
1-4.
s(0) = α0d(0) = ( 1.386,
1-6.
y (0) =
√ 142 + 62 = 15.232 > −c(0) = (−14, −6) x(0) + αd(0) = (1 − 14α, 2 − 6α)
y (0) s(0) =
·
y (0)y (0)
f (x(1)) = f (α) = 5(1 − 14α)2 + 2(1 − 14α)(2 − 6α) + 2(2 − 6α)2 + 7 df dα
−
− 6α) = 0
d2f = dα2
α = 0.0988, (1)
1-5.
x
(0)
= x
=
D(0) =
= 5(2)(−14)(1 − 14α) + 2(−14)(2 − 6α) + 2(−6)(1 − 14α) +2( 6)(2
T
(0)
+ α0d
= (1
2348 > 0
·
y (1)y (1)
T
=
D(1) =
2-4.
√ 1.0462 + 2.0422 = 2.29 > H (1)d(1) ⇒ d(1) = (17.20, −76.77)
x(2) = x(1) + αd(1)
α1 = 0.018455 (minimize f (x(1) + αd(1)))
2-5.
(2)
(1)
+
d(1)
15.67
9.760 2.567
E (0) =
2.567 0.675
( 0 0686
0 0098)
− −
196 84 84
0.845 0.362
9.915 2.205
36
−0.362 −0.155
2.205 0.520
s(1) = α1d(1) = (0.317, y (1) s(1) =
=
59.55
T
c(0)c(0) =
99
y (1) =
2-3.
226.40 59.55
Chen CL
2-6.
=
− 14α, 2 − 6α) = (−0.386, 1.407) 98
||c(1)|| −c(1)
H (1) = H (0) + D(0) + E (0) =
Chen CL
2-2.
−0.593), c(1) = (−1.406, 2.042) c(1) − c(0) = (−15.046, −3.958) c(0) · d(0) = −232.0 23.20, −
−1.417), c(2) = (−0.706, −0.157) c(2) − c(1) = (0.340, −2.199) c(1) · d(1) = −174.76 3.224, 0.1156 −0.748 1.094 −2.136 c(1)c(1) = −0.748 4.836 −2.136 4.170 −.0063 .0122 0.036 −0.232 E (1) = −0.232 1.500 .0122 −.0239
H (2) = H (1) + D (1) + E (1) =
T
9.945 1.985 1.985 1.996