Topic and contents School of Mathematics and Statistics
MATH3161/MATH5165 – Optimization Prof Jeya Jeyakumar ✞
☎
Topic 01 – Optimization – What is it? ✆
✝ 1
Optimization What is optimization? Optimization in the “real world” Mathematics of optimization Variables Objective Constraints Standard formulation
2
Mathematical background Vector norms Cauchy-Schwarz Inequality Optima and optimizers Existence Relaxation Derivatives Positive definite matrices
Optimi Optimizat zation ion
What What is optim optimiza izatio tion? n?
What is optimization? Definition 1 (Optim Definition (Optimization/ ization/Optimisat Optimisation) ion) Optimization is a process that finds the “best “best”” possible solutions from a set of feasible solutions. When you optimize something, you are “making it best”. “Optimization” comes from the same root as “optimal” which means best. But “best “best” ” can vary. Meaning of “best” =⇒ concept of ordering ordering Business/Economics: Maximize: profit, profit, return, return, utilit utility. y. . . Minimize: cos cost, ris risk, k, . . .
Engineering Maximize: strengt strength, h, product production, ion, . . . Minimize: cost cost,, mate materi rial als, s, time, time, . . .
Definition 2 (What is an optimization problem?) An optimization problem is a mathematical problem of finding the best possible solution from a set of feasible solutions. It has the form of minimizing (or maximizing) an objective function subject to constraints.
Optimi Optimizat zation ion
What What is optim optimiza izatio tion? n?
What is optimization? Definition 1 (Optim Definition (Optimization/ ization/Optimisat Optimisation) ion) Optimization is a process that finds the “best “best”” possible solutions from a set of feasible solutions. When you optimize something, you are “making it best”. “Optimization” comes from the same root as “optimal” which means best. But “best “best” ” can vary. Meaning of “best” =⇒ concept of ordering ordering Business/Economics: Maximize: profit, profit, return, return, utilit utility. y. . . Minimize: cos cost, ris risk, k, . . .
Engineering Maximize: strengt strength, h, product production, ion, . . . Minimize: cost cost,, mate materi rial als, s, time, time, . . .
Definition 2 (What is an optimization problem?) An optimization problem is a mathematical problem of finding the best possible solution from a set of feasible solutions. It has the form of minimizing (or maximizing) an objective function subject to constraints.
Optimi Optimizat zation ion
Optimiz Optimizati ation on in the “real “real world world” ”
Optimization in the “Real World” The applicability of optimization is widespread, reaching into almost every activity activity in which numerical numerical informatio information n is processed. To provide provide a comprehensive account of all these application areas would therefore be unrealistic, but a selection might include: Traditional Application Areas: Areas: Portfolio management problems in Banking and Finance; structural design problems in manufacturing and engineering; resource allocation and scheduling problems in commerce; farm planning problems in Agriculture etc. Emerging Scientific Areas: Areas: Sensor network localization problems in wireless communications; data mining problems in machine learning and information sciences; optimization models as decision support tools in medicine (e.g. optimization based screening algorithms for identifying neurological and other disorders). See the course webpage on Moodle for recent research papers on these emerging application areas in a folder–additional resources.
Optimization
Mathematics of optimization
Mathematics of Optimization–Outline Mathematical model Variables Objective Constraints
Characterising optima ⇐⇒ Optimality principles What is an optimum/optimizer? Formulae for identifying/characterizing optima Derivatives I Newton (1642 – 1726), G W Leibniz (1646 – 1711) Analytic methods for finding optima Fermat (1601 – 1665), L Euler (1707 – 1783), Lagrange (1736 – 1813)
Finding optima =⇒ Numerical methods Newton’s method: I Newton (1642 – 1726), C F Gauss (1777 – 1855) Linear programming: L Kantorovich (1912 – 1986), G B Dantzig (1914 – 2005) Computer algorithms Linear algebra
Convexity R T Rockafellar (1938 – ) Duality J von Neumann (1903 – 1957) Maximum principle L S Pontryagin (1908-1988)
Optimization
Variables
Variables Decision variables: what can you change Finite dimensional x ∈ Rn , Number of variables n Column vector
x x x = . ..
1 2
xn
Univariate: n = 1, x ∈ R Multivariate: n = 2, 3, 4, . . . up to millions Mathematical background: column vectors, symmetric matrices, matrix operations Discrete optimization: xi ∈ {0, 1}, xi ∈ N, xi ∈ Z Matrix of variables X ∈ Rm m =⇒ x ∈ Rm ×
Infinite dimensional: the control A function u ∈ C ([a, b]) A function u ∈ L1([0, T ])
2
Optimization
Objective
Objective A mathematical function of the variables quantifying the idea of “best”. Finite dimensional: variables x ∈ Rn , Objective function f : Rn → R Linear f (x) = g T x, g ∈ Rn fixed Affine f (x) = g T x + f 0 , g ∈ Rn , f 0 ∈ R fixed Quadratic f (x) = 21 xT Gx + gT x + f 0 , G ∈ Rn
×n
, g ∈ Rn , f 0 ∈ R
Infinite dimensional: f : C ([0, T ] → R , variables u ∈ C ([0, T ]) T
f (u) =
u(t)dt
0
Co-domain of objective function must be ordered (total order) If α, β, γ ∈ R , then α ≤ β and β ≤ α ⇐⇒ α β α ≤ β and β ≤ γ ⇒ α ≤ γ Either α ≤ β or β ≤ α =
=
If u, v ∈ Rn , n ≥ 2, u ≤ v ⇐⇒ v − u ≥ 0 ⇐⇒ (vi − ui ) ≥ 0, i = 1, . . . , n
Optimization
Constraints
Constraints Constraints: Describe restrictions on the allowable values of variables Constraint structure for variables x ∈ Rn Equality constraints ci (x) = 0, i = 1, . . . , m E Inequality constraints ci (x) ≤ 0,
i = m E + 1, . . . , m
Feasible region Ω ⊂ Rn Ω = {x ∈ Rn : ci (x) = 0, i = 1, . . . , m E ; ci (x) ≤ 0, i = m E + 1, . . . , m} Unconstrained problem ⇐⇒ Ω = Rn Standard formulation cˆi (x) ≥ 0 ⇐⇒ −cˆi (x) ≤ 0 Algebraic structure of constraints Simple bounds: ℓ ≤ x ≤ u ⇐⇒ ℓi ≤ xi ≤ ui , Linear constraints ci (x) = a T x − bi i Nonlinear constraints
i = 1, . . . , n
Optimization
Constraints
Constraint representation Example 3 (Constraint representation) What feasible regions do the following constraints represent?
(x − a ) + (x − a ) = r (x − a ) + (x − a ) ≤ r (x − a ) + (x − a ) ≥ r 1
1
1
1
1
1
x2 = x21 x2 ≥ x21
x21 − 1 = 0
2 2 2
2
2
2
2
2
2
2 2 2
Optimization
Standard formulation
Standard formulation Definition 4 (Standard formulation) The standard formulation of a continuous finite dimensional optimization is Minimize f (x) x ∈ Rn subject to ci (x) = 0, ci (x) ≤ 0,
i = 1, . . . , m E ; i = m E + 1, . . . , m
Conversions ˆ x) = − min −f ( ˆ x) Maximize to minimize: max f ( Constraint right-hand-side: cˆi (x) = b i ⇐⇒ cˆi (x) − bi = 0 Less than or equal to inequalities: cˆi (x) ≥ 0 ⇐⇒ −cˆi (x) ≤ 0 Strict inequality: cˆi (x) < 0 ⇐⇒ cˆi (x) + ǫ ≤ 0, some ǫ > 0
Optimization
Standard formulation
A simplified farm planning problem Example 5 (Farm Planning) Farmer Jack has 100 acres to devote to wheat and corn and wishes to plan his planting to maximize the expected revenue. Jack has only $800 in capital to apply to planting the crops, and it costs $5 to plant an acre of wheat and $10 for an acre of corn. Their other activities leave the Jack family only 150 days of labour to devote to the crops. Two days will be required for each acre of wheat and one day for an acre of corn. If past experience indicates a return of $40 from each acre of wheat and $30 from each acre of corn, how many acres of each should be planted to maximize his revenue? Pose this as an optimization problem in standard form.
Optimization
Standard formulation
Post Office Parcel problem Example 6 (Post Office Parcel Problem) At one time the post office regulations were that the length plus the girth of a parcel must not exceed 1.8 metres. What is the parcel of largest volume that could be sent through the post? Pose this as an optimization problem in standard form. Assume that The parcel has rectangular sides The length of the parcel is the longest edge The girth is the distance around the parcel perpendicular to the length. For a rectangular box, girth is 2×(height + depth).
x
3
x
1
Optimization
Standard formulation
Standard formulation – Example 2 Example 7 (Standard formulation) Maximize −x21 − (x2 − 1)2 (x2 − 3)2 − x2 /2 on the set Ω = {x ∈ R2 : x2 ≥ x21 ,
x1 ≤ 1,
x2 ≤ 2 + x1 ,
x 2 ≤ 4 − x1 }
Write this problem in standard form and plot the feasible region Ω. What is the feasible region if the first constraint becomes x2 = x21 ? 4
3.5
3
2.5
2
1.5
1
0.5
0
2
15
1
05
0
05
1
15
2
Mathematical background
Vector norms
Measures of size - norms If α ∈ R then its magnitude (or absolute value) |α| is α if α ≥ 0; |α| = −α if α ≤ 0. In Rn there are several possible definitions – norms denoted using || ||. Definition 8 (Vector norm) A vector norm on Rn is a function ||.|| from Rn to R such that
1 2 3
x ≥ 0 for all x ∈ Rn and x = 0 ⇐⇒ x = 0. x + y ≤ x + y for all x, y ∈ Rn . (Triangle inequality) αx = |α| x for all α ∈ R , x ∈ Rn.
Example 9 (Vector norms) The most widely used vector norms are 1-norm x1 = ni=1 |xi |
2-norm x = 2
n 2 x | | i i=1
1 2
T
= (x x)
1 2
∞-norm or maximum norm x∞ = maxi=1,...,n |xi |
Mathematical background
Cauchy-Schwarz Inequality
Cauchy-Schwarz Inequality An important property connecting the dot product of two vectors and their norms is the Cauchy-Schwarz inequality:
|xT y| ≤ x2 y2 , for any x, y ∈ Rn . Equality holds if and only if x and y are linearly dependent. Ex*. Show that the 2-norm, given by
f (x) = x2 , satisfies (1)–(3) of Definition 7 (vector norm). [Hint: Cauchy-Schwarz inequality is useful here for verifying (2)].
Mathematical background
Optima and optimizers
Local and global minima Definition 10 (Global minimum) A point x∗ ∈ Ω is a global minimizer of f (x) over Ω ⊆ Rn ⇐⇒ f (x∗ ) ≤ f (x) for all x ∈ Ω. The global minimum is f (x∗ ). Definition 11 (Strict global minimum) A point x∗ ∈ Ω is a strict global minimizer of f (x) over Ω ⊆ Rn ⇐⇒ f (x∗ ) < f (x) for all x ∈ Ω, x = x∗ . Definition 12 (Local minimum) A point x∗ ∈ Ω is a local minimizer of f (x) over Ω ⊆ Rn ⇐⇒ there exists a δ > 0 such that f (x∗ ) ≤ f (x) for all x ∈ Ω with x − x∗ ≤ δ . Then f (x∗ ) is a local minimum. Definition 13 (Strict local minimum) A point x∗ ∈ Ω is a strict local minimizer of f (x) over Ω ⊆ Rn ⇐⇒ there exists a δ > 0 such that f (x∗ ) < f (x) for all x ∈ Ω with 0 < x − x∗ ≤ δ .
Mathematical background
Optima and optimizers
Local and global maxima Definition 14 (Global maximum) A point x∗ ∈ Ω is a global maximizer of f (x) over Ω ⊆ Rn ⇐⇒ f (x∗ ) ≥ f (x) for all x ∈ Ω. The global maximum is f (x∗ ). Definition 15 (Strict global maximum) A point x∗ ∈ Ω is a strict global maximizer of f (x) over Ω ⊆ Rn ⇐⇒ f (x∗ ) > f (x) for all x ∈ Ω, x = x∗ . Definition 16 (Local maximum) A point x∗ ∈ Ω is a local maximizer of f (x) over Ω ⊆ Rn ⇐⇒ there exists a δ > 0 such that f (x∗ ) ≥ f (x) for all x ∈ Ω with x − x∗ ≤ δ . Then f (x∗ ) is a local maximum. Definition 17 (Strict local maximum) A point x∗ ∈ Ω is a strict local maximizer of f (x) over Ω ⊆ Rn ⇐⇒ there exists a δ > 0 such that f (x∗ ) > f (x) for all x ∈ Ω with 0 < x − x∗ ≤ δ .
Mathematical background
Optima and optimizers
Local and global minima and maxima Example 18 (Local and global minima and maxima) 2
Ω = [0, 5],
(x − 1) 0.25 f (x) = 1.25 − (x − 2) x − 1.5 1.5 − 0.25sin(π(x − 3)) 2
Local and global extrema
1.8
1.6
1.4
1.2
f(x) 1
0.8
0.6
0.4
0.2
if x ≤ 0.5; if 0.5 < x ≤ 1; if 1 < x ≤ 2.5; if 2.5 < x ≤ 3; if 3 < x.
Mathematical background
Optima and optimizers
Local and global minima and maxima Solution 19 (Local and global minima and maxima) Consider f (x) on the interval Ω = [0, 5] and the points 0, 0.5, 2, 2.5, 3, 3.5, 4.5, 5 x(a) = 0 is strict local maximizer with f (a) = 1; any point x(b) in the interval [0.5, 1] is a local and global minimizer with f (b) = 0.25 (but not strict as adjacent points have the same function value); x(c) = 2 is a strict local maximizer with f (c) = 1.25; x(d) = 2.5 is a strict local minimizer with f (d) = 1; x(e) = 3 is a strict local maximizer with f (e) = 1.5; x(f ) = 3.5 is a strict local minimizer with f (f ) = 1.25; x(g) = 4.5 is a strict local and global maximizer with f (g) = 1.75; x(h) = 5 is a strict local minimizer with f (h) = 1.5.
Mathematical background
Existence
Existence Definition 20 (Extrema) The global/local extrema of f over Ω are all the global/local minima and all the global/local maxima Proposition 1 (Existence of global extrema) Let Ω be a compact set and let f be continuous on Ω. Then the global extrema of f over Ω exist. Finite dimensional Ω ⊆ Rn is compact ⇐⇒ Ω is closed and bounded Example 21 (Existence) Find the global extrema, if they exist, for the following problems f (x) = e −x on Ω = [0, 1] f (x) = e −x on Ω = [0, ∞) f (x) = sin x on Ω = [0, 2π)
Mathematical background
Relaxation
Relaxation Proposition 2 (Relaxation) ¯ ⊆ Ω then If f : Rn → R and Ω min f (x) ≤ min f (x) x∈Ω
¯ x∈Ω
Thus, the minimum value of the relaxation problem ≤ the minimum value of the original problem.
Proof. ¯ and f (x∗ ) be the global minimizer and minimum of f over Ω. ¯ Let x∗ ∈ Ω ¯ ⊆ Ω, x∗ ∈ Ω ¯ =⇒ x∗ ∈ Ω. Thus, minx∈Ω f (x) ≤ f (x∗ ). As Ω If you make the feasible region larger then the minimum value of objective function must not increase
Mathematical background
Derivatives
Gradients Definition 22 (Gradient) Let f : Rn → R be continuously differentiable. The gradient ∇f : Rn → Rn of f at x is
∇f (x) =
∂f (x) ∂x 1 ∂f (x) ∂x 2
.. .
∂f (x) ∂x n
The gradient is a column vector with n elements ¯ is orthogonal to the contour The gradient ∇f (¯x) of f at x
{x ∈ Rn : f (x) = f (¯x)}
Mathematical background
Derivatives
Hessians Definition 23 (Hessian) Let f : Rn → R be continuously differentiable. The Hessian ∇2 f : Rn → Rn×n of f at x is
∇ f (x) = 2
∂ 2 f (x) ∂x 21 ∂ 2 f (x) ∂x 2 x1
∂ 2 f (x) ∂x 1 x2
.. .
...
∂ 2 f (x) ∂x x1 n
∂ 2 f (x) ∂x 22
∂ 2 f (x) ∂x x2 n
···
∂ 2 f (x) ∂x 1 x
···
∂ 2 f (x) ∂x 2 x
..
...
.
···
n
n
∂ 2 f (x) ∂x 2 n
The Hessian ∇2 f (x) is an n by n matrix If f is twice continuously differentiable at x then ∂ 2 f (x) ∂ 2 f (x) = ∂xi x j ∂x j xi
for all i = j
That is the Hessian matrix G = ∇2 f (x) is symmetric (GT = G)
Mathematical background
Derivatives
Gradient and Hessian – Example Find the gradient and Hessian of f (x) = −2x21 − 3x22 + 4x1 x2 + 2x1 + 6x2 + 8.
Mathematical background
Derivatives
Gradient and Hessian – Exercise Example 24 (Gradients and Hessians) Let f (x) = x21 + (x2 − 1)2 (x2 − 3)2 + x2 /2 and c1 (x) = −x21 + x2 , c2 (x) = −x1 + 1, c3 (x) = −x1 + x2 − 2, c4 (x) = −x1 − x2 + 4. For each function f (x), ci (x), i = 1, 2, 3, 4 Find the gradient and Hessian Determine if the function is linear, quadratic or nonlinear
Mathematical background
Example 1 – Plots x2 + (x −1)2 (x −3)2 + x /2 1
2
2
2
9 6
3.6
3 3.1
4
2
1.5 1.75 2.6
2
2.1
x
2.5
2
1.75 1.6
2
1.1
1
2.5
0.5
0.6
2 1.25 1.5 0.75
3
4 6 9
0.1
15 21 −0.4 −2
−1.5
−1
−0.5
0
x
1
0.5
1
1.5
2
Derivatives
Mathematical background
Derivatives
Linear and Quadratic functions Example 25 (Linear and Quadratic functions) Let f 0 ∈ R , g ∈ Rn and G ∈ Rn×n , G symmetric, be fixed. Find the gradient ∇f (x) and Hessian ∇2 f (x) for the Linear function
f (x) = gT x + f 0
Quadratic function Solution 26 f (x) = gT x + f 0 =
f (x) = 12 xT Gx + gT x + f 0 n
g x + f
0
i i
i=1
∇f (x) = g, does not depend on x ∇2 f (x) = 0 ∈ Rn n , n by n zero matrix ×
n
f (x) = 12 xT Gx + gT x + f 0 =
1 2
n
n
x G x + g x + f i
i=1 j =1
∇f (x) = G x + g ∇2 f (x) = G, does not depend on x
ij j
i i
i=1
0
Mathematical background
Positive definite matrices
Positive definite matrices – Definition Definition 27 A real square matrix A ∈ Rn×n is positive definite ⇐⇒ xT Ax > 0 for all x ∈ Rn , x =0 positive semi-definite ⇐⇒ xT Ax ≥ 0 for all x ∈ Rn negative definite ⇐⇒ xT Ax < 0 for all x ∈ Rn , x =0 negative semi-definite ⇐⇒ xT Ax ≤ 0 for all x ∈ Rn indefinite ⇐⇒ there exist T y0 Ay0 < 0
x0 , y0
∈ Rn :
x0
T
Ax0 > 0 and
Generalization of nonnegative (order) to symmetric matrices: A B ⇐⇒ A − B 0 ⇐⇒ A − B positive semidefinite Theoretical definition, not a practical test A is negative definite ⇐⇒ −A is positive definite
Mathematical background
Positive definite matrices
Positive definite matrices – Eigenvalues A symmetric matrix A ∈ Rn×n has n real eigenvalues λi , i = 1, . . . , n there exists an orthogonal matrix Q (QT Q = I ) such that A = QDQT where D = diag(λ1 , . . . , λ n ) and Q = [v1 v2 · · · vn ] where vi is an eigenvector of A corresponding to eigenvalue λi Determinant: det (A) = ni=1 λi Trace: trace (A) := ni=1 aii = ni=1 λi Proposition 3
A symmetric matrix A ∈ Rn×n is positive definite ⇐⇒ λi > 0 for all i = 1, . . . , n positive semi-definite ⇐⇒ λi ≥ 0 for all i = 1, . . . , n negative definite ⇐⇒ λi < 0 for all i = 1, . . . , n negative semi-definite ⇐⇒ λi ≤ 0 for all i = 1, . . . , n indefinite ⇐⇒ there exist i, j : λi > 0 and λ j < 0
Mathematical background
Positive definite matrices
Positive definite matrices – Principal Minors Definition 28 The ith principal minor, ∆i , of a symmetric matrix A ∈ Rn×n is the determinant of the leading i × i submatrix of A. Proposition 4 A symmetric matrix A ∈ Rn×n is positive definite if and only if all the principal minors, ∆i , i = 1, 2, . . . , n of A are positive If, however, ∆i , i = 1, 2, . . . , n has the sign of (−1)i , i = 1, 2, . . . , n (i.e. the values of ∆i are alternatively negative and positive), then the matrix A is negative definite. Example 29 (a)
3
−3
−3 5
(b)
−3 3 3
−5