clase minimi sincond 11

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

1/76

Mtodos Matemticos de Especialidad

Ingeniera Elctrica

Optimizacin (minimizacin)de funciones sin condiciones

Jos Luis de la Fuente [email protected]

Escuela Tcnica Superior de Ingenieros Industriales

Universidad Politcnica de Madrid

Clase_minimi_sincond_11.pdf

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

2/76

ndice

El problema

Condiciones de mnimo

Mtodos de direccin de descenso

Mtodo del gradiente o de la mxima pendienteMtodo de NewtonMtodo de los gradientes conjugadosMtodos cuasi NewtonMtodos de regin de confianza

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

3/76

El problema Dar solucin a

minimizar f W Rn! RLa funcin f se supone continua en algn conjunto abierto deRn, con derivadas parciales continuas hasta segundo orden en eseabierto.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

4/76

Ejemplos

La funcin f .x/ D .x x/2 tiene un nico punto dondealcanza el mnimo, x.

1. INTRODUCTION

In this lecture note we shall discuss numerical methods for the solution ofthe optimization problem. For a real function of several real variables wewant to find an argument vector which corresponds to a minimal functionvalue:

Definition 1.1. The optimization problem:Find x = argminxf(x) , where f : IRn 7 IR .

The function f is called the objective function or cost function and x is theminimizer.In some cases we want a maximizer of a function. This is easily determinedif we find a minimizer of the function with opposite sign.Optimization plays an important role in many branches of science and appli-cations: economics, operations research, network analysis, optimal designof mechanical or electrical systems, to mention but a few.

Example 1.1. In this example we consider functions of one variable. The functionf(x) = (x x)2

has one, unique minimizer, x, see Figure 1.1.

Figure 1.1: y = (x x)2.One minimizer.

x

y

x*

1. INTRODUCTION 2

The function f(x) = 2 cos(x x) has infinitely many minimizers: x =x + 2ppi , where p is an integer; see Figure 1.2.

x

y

Figure 1.2: y = 2 cos(x x). Many minimizers.The function f(x) = 0.015(x x)2 2 cos(x x) has a unique globalminimizer, x. Besides that, it also has several socalled local minimizers, eachgiving the minimal function value inside a certain region, see Figure 1.3.

x

y

x*

Figure 1.3: y = 0.015(x x)2 2 cos(x x).One global minimizer and many local minimizers.

The ideal situation for optimization computations is that the objective func-tion has a unique minimizer. We call this the global minimizer.In some cases the objective function has several (or even infinitely many)minimizers. In such problems it may be sufficient to find one of these mini-mizers.In many objective functions from applications we have a global minimizerand several local minimizers. It is very difficult to develop methods whichcan find the global minimizer with certainty in this situation. Methods forglobal optimization are outside the scope of this lecture note.The methods described here can find a local minimizer for the objectivefunction. When a local minimizer has been discovered, we do not knowwhether it is a global minimizer or one of the local minimizers. We can-not even be sure that our optimization method will find the local minimizer

La funcin f .x/ D 2 cos.x x/ tiene mnimos locales enx D x C 2a.

1. INTRODUCTION







x

y

x*

1. INTRODUCTION 2


x

y


x

y

x*



h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

5/76

La funcin f .x/ D 0;015.x x/2 tiene un nico mnimo globalx y muchos mnimos locales.

1. INTRODUCTION







x

y

x*

1. INTRODUCTION 2


x

y


x

y

x*



El objetivo de cualquier mtodo en encontrar el mnimo global,si existe, o un mnimo local.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

6/76

Condiciones de mnimoDefinicin Una funcin f W Rn! R se dice convexa si cumple que

f .x C y/ f .x/C f .y/para todo x;y 2 Rn y todo ; 2 R, con C D 1, 0, 0.

7.4 Convex and Concave Functions 193

f

xconvex

(a)

f

xnonconvex

(c)

f

xconvex

(b)

Fig. 7.3 Convex and nonconvex functions

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

7/76

Teorema. Condiciones de convexidad de primer orden Si f W Rn! Res diferenciable (su gradiente, rf .x/, existe para todo x 2 Rn), esconvexa si para todo x;y 2 Rn se cumple que

f .y/ f .x/Crf .x/T .y x/:3.1 Basic properties and examples 69

(x, f(x))

f(y)

f(x) +f(x)T (y x)

Figure 3.2 If f is convex and differentiable, then f(x)+f(x)T (yx) f(y)for all x, y dom f .

is given by

IC(x) =

{0 x C x 6 C.

The convex function IC is called the indicator function of the set C.

We can play several notational tricks with the indicator function IC . For examplethe problem of minimizing a function f (defined on all of Rn, say) on the set C is thesame as minimizing the function f + IC over all of R

n. Indeed, the function f + ICis (by our convention) f restricted to the set C.

In a similar way we can extend a concave function by defining it to be outside its domain.

3.1.3 First-order conditions

Suppose f is differentiable (i.e., its gradient f exists at each point in dom f ,which is open). Then f is convex if and only if dom f is convex and

f(y) f(x) +f(x)T (y x) (3.2)

holds for all x, y dom f . This inequality is illustrated in figure 3.2.The affine function of y given by f(x)+f(x)T (yx) is, of course, the first-order

Taylor approximation of f near x. The inequality (3.2) states that for a convexfunction, the first-order Taylor approximation is in fact a global underestimator ofthe function. Conversely, if the first-order Taylor approximation of a function isalways a global underestimator of the function, then the function is convex.

The inequality (3.2) shows that from local information about a convex function(i.e., its value and derivative at a point) we can derive global information (i.e., aglobal underestimator of it). This is perhaps the most important property of convexfunctions, and explains some of the remarkable properties of convex functions andconvex optimization problems. As one simple example, the inequality (3.2) showsthat if f(x) = 0, then for all y dom f , f(y) f(x), i.e., x is a global minimizerof the function f .

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

8/76

Teorema. Condiciones de convexidad de segundo orden Si f W Rn!R tiene derivadas parciales de segundo orden, (su matriz Hessia-na, r2f .x/, existe para todo x 2 Rn), es convexa si para todox 2 Rn 0 se cumple que

xTr2f .x/x 0;es decir, la Hessiana es semidefinida positiva.

Ejemplo: la funcin f W R2! R, f .x; y/ D x2=y, y > 0 esconvexa.

72 3 Convex functions

xy

f(x,y)

20

2

0

1

20

1

2

Figure 3.3 Graph of f(x, y) = x2/y.

Negative entropy. x log x (either on R++, or on R+, defined as 0 for x = 0)is convex.

Convexity or concavity of these examples can be shown by verifying the ba-sic inequality (3.1), or by checking that the second derivative is nonnegative ornonpositive. For example, with f(x) = x log x we have

f (x) = log x+ 1, f (x) = 1/x,

so that f (x) > 0 for x > 0. This shows that the negative entropy function is(strictly) convex.

We now give a few interesting examples of functions on Rn.

Norms. Every norm on Rn is convex. Max function. f(x) = max{x1, . . . , xn} is convex on Rn. Quadratic-over-linear function. The function f(x, y) = x2/y, with

dom f = RR++ = {(x, y) R2 | y > 0},

is convex (figure 3.3).

Log-sum-exp. The function f(x) = log (ex1 + + exn) is convex on Rn.This function can be interpreted as a differentiable (in fact, analytic) approx-imation of the max function, since

max{x1, . . . , xn} f(x) max{x1, . . . , xn}+ log n

for all x. (The second inequality is tight when all components of x are equal.)Figure 3.4 shows f for n = 2.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

9/76

Teorema. Condiciones necesarias de mnimo local de primer ordenSi x es un mnimo local de f W Rn! R, se cumple que

rf .x/ D 0:

Un punto x que cumple rf .x/ D 0 se denomina puntoestacionario de f .x/.

Teorema. Condiciones suficientes de mnimo local de segundo orden

Si x es un punto estacionario y r2f .x/ es definida positiva,x es un mnimo local.

IMPORTANTE Cualquier mnimo local de una funcin convexaes un mnimo global.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

10/76

Se pueden dar distintos casos de mnimos locales.

5 1. INTRODUCTION

Theorem 1.5. Necessary condition for a local minimum.If x is a local minimizer for f : IRn 7 IR, then

f (x) = 0 .

The local minimizers are among the points with f (x) = 0. They have aspecial name.

Definition 1.6. Stationary point. If f (xs) = 0, then xs is said to bea stationary point for f .

The stationary points are the local maximizers, the local minimizers and therest. To distinguish between them, we need one extra term in the Taylorexpansion. Provided that f has continuous third derivatives, then

f(x + h) = f(x) + h>f (x) + 12h>f (x)h +O(h3) , (1.7)

where the Hessian f (x) of the function f is a matrix containing the secondpartial derivatives of f :

f (x) [

2f

xixj(x)

]. (1.8)

Note that this is a symmetric matrix. For a stationary point (1.7) takes theform

f(xs + h) = f(xs) +12h>f (xs)h +O(h3) . (1.9)

If the second term is positive for all h we say that the matrix f (xs) ispositive definite (cf Appendix A, which also gives tools for checking def-initeness). Further, we can take h so small that the remainder term isnegligible, and it follows that xs is a local minimizer.

Theorem 1.10. Sufficient condition for a local minimum.Assume that xs is a stationary point and that f (xs) is positive definite.Then xs is a local minimizer.

The Taylor expansion (1.7) is also the basis of the proof of the followingcorollary,

1.1. Conditions for a Local Minimizer 6

Corollary 1.11. Assume that xs is a stationary point and that f (x) ispositive semidefinite when x is in a neighbourhood of xs. Then xs is alocal minimizer.

The local maximizers and the rest, which we call saddle points, can becharacterized by the following corollary, also derived from (1.7).

Corollary 1.12. Assume that xs is a stationary point and thatf (xs) 6= 0. Then1) if f (xs) is positive definite: see Theorem 1.10,2) if f (xs) is positive semidefinite: xs is a local minimizer or a saddle

point,3) if f (xs) is negative definite: xs is a local maximizer,4) if f (xs) is negative semidefinite: xs is a local maximizer or a

saddle point,5) if f (xs) is neither definite nor semidefinite: xs is a saddle point.

If f (xs) = 0, then we need higher order terms in the Taylor expansion inorder to find the local minimizers among the stationary points.

Example 1.3. We consider functions of two variables. Below we show the variationof the function value near a local minimizer (Figure 1.5a), a local maximizer(Figure 1.5b) and a saddle point (Figure 1.5c). It is a characteristic of a saddlepoint that there exists one line through xs, with the property that if we followthe variation of the f -value along the line, this looks like a local minimum,whereas there exists another line through xs, indicating a local maximizer.

a) minimum b) maximum c) saddle pointFigure 1.5: With a 2-dimensional x we see surfaces

z = f(x) near a stationary point.1. El primero es un mnimo local y r2f .x/ es definida

positiva.

2. El segundo es un mximo local y r2f .x/ es definidanegativa.

3. El tercero, un punto de silla, se da cuando r2f .x/ essemidefinida positiva, semidefinida negativa o indefinida.

Si r2f .x/ D 0 se necesita ms informacin de derivadasparciales de tercer orden para determinar los mnimos locales.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

11/76

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

12/76

05

1015

2025

0

10

20

3010

5

0

5

10

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

13/76

Mtodos de direccin de descenso Siguen un procedimiento iterativo que hace descender la funcinen sucesivos puntos k del proceso, es decir,

f .xkC1/ < f .xk/ ;mediante el clculo de una direccin de descenso en la quemoverse con este objetivo.

x(k)p(k)

kp(k)

x(k+1)xkC1 pk

pkxk

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

14/76

El esquema algortmico es este.

Dados Un x WD x0 y una tol. Hacer found WD falsewhile (not found) and (k < kmax)

Calcular direccin de descenso pif (p no existe) or (tol)

found WD trueelse

WD amplitud de paso .x;p/x WD x C p

endk WD k C 1

end

Se diferencian unos de otros en el clculo de p.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

15/76

Mtodo del gradiente o de la mximapendiente

Consideremos el desarrollo en serie de Taylor de f .x/ hastaprimer orden,

f .x C p/ D f .x/Crf .x/Tp CO kpk2 : La direccin p en x es una direccin de descenso si

rf .x/Tp < 0:

El descenso relativo de la funcin en p es

f .x/ f .x C p/kpk D

rf .x/Tpkpk D krf .x/k cos

donde es el ngulo que forman p y rf .x/. Ser mximocuando D .

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

16/76

Es decir, la direccin de mxima pendiente es

p D rf .x/3 Line search methods

Iteration: xk+1 = xk + kpk, where k is the step length (how far to move along pk), k > 0; pkis the search direction.

2 1.5 1 0.5 0 0.5 1 1.5 21

1.5

2

2.5

3

xkpk

Descent direction: pTkfk = pk fk cos k < 0 (angle < 2 with fk). Guarantees that fcan be reduced along pk (for a sufficiently smooth step):

f(xk + pk) = f(xk) + pTkfk +O(2) (Taylors th.)

< f(xk) for all sufficiently small > 0

The steepest descent direction, i.e., the direction along which f decreases most rapidly, is fig. 2.5fig. 2.6pk = fk. Pf.: for any p, : f(xk + p) = f(xk) + pTfk + O(2) so the rate of

change in f along p at xk is pTfk (the directional derivative) = p fk cos . Then

minp pTfk s.t. p = 1 is achieved when cos = 1, i.e., p = fk/ fk.

This direction is to the contours of f . Pf.: take x+p on the same contour line as x. Then, by Taylors th.:f(x+ p) = f(x) + pTf(x) + 1

2pT2f(x+ p)p, (0, 1) cos(p,f(x)) = 1

2

pT2f(x+ p)pp f(x) p0 0

but p 0 along the contour line means p/ p is parallel to its tangent at x.

The Newton direction is pk = 2f1k fk. This corresponds to assuming f is locallyquadratic and jumping directly to its minimum. Pf.: by Taylors th.:

f(xk + p) fk + pTfk + 12pT2fkp = mk(p)

which is minimized (take derivatives wrt p) by the Newton direction if 2fk is pd. (0 whathappens if assuming f is locally linear (order 1)?)In a line search the Newton direction has a natural step length of 1.

For most algorithms, pk = B1k fk where Bk is symmetric nonsingular:

6

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

17/76

Una vez calculada la direccin p en un punto del proceso, qupaso dar a lo largo de ella?, cunto moverse?

La solucin es minimizar la funcin en

'./ D f .x C p/:Se puede hacerlo estrictamente o de forma inexacta, a un costeen clculos menor. En cualquier caso, hay que garantizar conalgn indicador que

f .x C p/ < f .x/;es decir que la funcin decrezca suficientemente.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

18/76

Hay que evitar pasos muy largos: en la funcin x2, por ejemplo,las direcciones pk D .1/kC1 y los pasos k D 2C 3=2kC1producen el siguiente efecto.

Computing a Step Length k

The challenges in finding a good k are both in avoiding that

the step length is too long,

2 1.5 1 0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

(the objective function f(x) = x2 and the iterates xk+1 = xk+kpk generatedby the descent directions pk = (1)k+1 and steps k = 2+3/2k+1 from x0 = 2)

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

19/76

Tambin muy cortos como, tambin en x2, las direccionespk D 1 y los pasos k D 1=2kC1 que producen el siguienteefecto.

or too short,

2 1.5 1 0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

(the objective function f(x) = x2 and the iterates xk+1 = xk+kpk generated by the descentdirections pk = 1 and steps k = 1/2k+1 from x0 = 2).

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

20/76

Hagamos algunas consideraciones para establecer esosindicadores o criterios.

La derivada de la funcin f .x C p/ con respecto a es

f .x C p/0 D@f .x C p/

@x1p1 C @f .x C p/

@x2p2 C : : :C @f .x C p/

@xnpn

D rf .x C p/Tp:

En D 0: f .x C p/D0 D f .x/, f .x C p/0D0 D rf .x/Tp.

La aproximacin lineal de f .x C p/ en D 0 esf .x C p/D0C f .x C p/0D0 D f .x/C rf .x/Tp:

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

21/76

9.2 Descent methods 465

PSfrag replacements

t

f(x+ tx)

t = 0 t0

f(x) + tf(x)T xf(x) + tf(x)T x

Figure 9.1 Backtracking line search. The curve shows f , restricted to the lineover which we search. The lower dashed line shows the linear extrapolationof f , and the upper dashed line has a slope a factor of smaller. Thebacktracking condition is that f lies below the upper dashed line, i.e., 0 t t0.

The line search is called backtracking because it starts with unit step size andthen reduces it by the factor until the stopping condition f(x + tx) f(x) +tf(x)T x holds. Since x is a descent direction, we have f(x)T x < 0, sofor small enough t we have

f(x+ tx) f(x) + tf(x)T x < f(x) + tf(x)T x,which shows that the backtracking line search eventually terminates. The constant can be interpreted as the fraction of the decrease in f predicted by linear extrap-olation that we will accept. (The reason for requiring to be smaller than 0.5 willbecome clear later.)

The backtracking condition is illustrated in figure 9.1. This figure suggests,and it can be shown, that the backtracking exit inequality f(x + tx) f(x) +tf(x)T x holds for t 0 in an interval (0, t0]. It follows that the backtrackingline search stops with a step length t that satisfies

t = 1, or t (t0, t0].The first case occurs when the step length t = 1 satisfies the backtracking condition,i.e., 1 t0. In particular, we can say that the step length obtained by backtrackingline search satisfies

t min{1, t0}.When dom f is not all of Rn, the condition f(x+ tx) f(x)+tf(x)T x

in the backtracking line search must be interpreted carefully. By our conventionthat f is infinite outside its domain, the inequality implies that x+ tx dom f .In a practical implementation, we first multiply t by until x + tx dom f ;

f .x/C rf .x/Tpf .x/C %rf .x/Tp

f .x C p/

D 0 0

El criterio de Armijo establece que debe cumplir el que

f .x C p/ f .x/C %rf .x/Tp;donde % 2 .0; 1/ es una constante a elegir. Garantiza que lafuncin decrece razonablemente.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

22/76

El de Goldstein que

rf .x C p/Tp rf .x/Tp;donde 2 .; 1/. Es decir que el decrecimiento cualitativoprevisible en el nuevo punto sea mayor que una fraccin dada delexistente en el actual.

El mtodo inexacto ms extendido para calcular la amplitud depaso se conoce como backtracking .

Comienza con un paso completo, D 1, y lo va reduciendomediante un factor constante, , 2 .0; 1/, hasta que secumpla el criterio de Armijo.

Funciona slo si f .x C p/0D0 D rf .x/Tp < 0 (direccin dedescenso).

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

23/76

Algoritmo de la mxima pendiente para minimizar f .x/.


Calcular direccin de descenso p D g D rf .x/if (p no existe) or (tol)

found WD trueelse

WD amplitud de paso .x;p/ con backtrackingx WD x C p

endk WD k C 1

end

Tiene convergencia lineal.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

24/76

Resolvamos f .x/ D ex1C3x20;1C ex13x20;1C ex10;1 D 0:function [x f] = Maxima_pendiente_unc(fun,x0)% Maxima_pendiente_unc: Mtodo de la mxima pendiente

rho = 0.1; beta = 0.5; % Parmetros y partidaf1=0; maxit = 100; x = x0;

for i=1:maxit % Proceso iterativo[f g] = fun(x);if (abs(f-f1) < 1e-10), break, endp = -g; alpha = 1;for k=1:10 % Backtracking de amplitud de paso

xnew = x+alpha*p; fxnew = fun(xnew);if fxnew < f + alpha*rho*g*p

breakelse alpha = alpha*beta;end

endx = x + alpha*p; f1=f;fprintf(%4.0f %13.8e %13.8e %13.8e %13.8e\n,i,x,f,alpha);

end

function [f g]= objfun_min1(x)A = [1 3; 1 -3; -1 0]; b = -0.1*[1; 1; 1];f = sum(exp(A*x+b)); if nargout

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

25/76

>> [x f]=Maxima_pendiente_unc(@objfun_min1,[-1;1])1 -1.26517900e+000 -2.50497831e-001 9.16207023e+000 6.25000000e-0022 -6.29000734e-001 6.51924176e-002 3.86828053e+000 2.50000000e-0013 -4.50514899e-001 -7.72284882e-002 2.68052760e+000 2.50000000e-0014 -4.21089848e-001 2.38719166e-002 2.60419641e+000 1.25000000e-0015 -3.97610304e-001 -8.05335008e-003 2.56942254e+000 1.25000000e-0016 -3.65030711e-001 1.39821003e-002 2.56295544e+000 2.50000000e-0017 -3.59263955e-001 -5.78404615e-003 2.56080796e+000 1.25000000e-0018 -3.55227870e-001 2.43800662e-003 2.55966300e+000 1.25000000e-0019 -3.52463501e-001 -1.04150522e-003 2.55939647e+000 1.25000000e-001

10 -3.48696568e-001 1.93956544e-003 2.55931730e+000 2.50000000e-00111 -3.48020112e-001 -8.46703041e-004 2.55929408e+000 1.25000000e-00112 -3.47557873e-001 3.70439501e-004 2.55927350e+000 1.25000000e-00113 -3.47243091e-001 -1.62316050e-004 2.55926873e+000 1.25000000e-00114 -3.47028931e-001 7.11957301e-005 2.55926742e+000 1.25000000e-00115 -3.46737604e-001 -1.33695923e-004 2.55926699e+000 2.50000000e-00116 -3.46685147e-001 5.87394995e-005 2.55926683e+000 1.25000000e-00117 -3.46649462e-001 -2.58117184e-005 2.55926673e+000 1.25000000e-00118 -3.46625190e-001 1.13436901e-005 2.55926671e+000 1.25000000e-00119 -3.46592176e-001 -2.13150939e-005 2.55926670e+000 2.50000000e-00120 -3.46586231e-001 9.36927894e-006 2.55926670e+000 1.25000000e-00121 -3.46582187e-001 -4.11844758e-006 2.55926670e+000 1.25000000e-00122 -3.46579437e-001 1.81036718e-006 2.55926670e+000 1.25000000e-00123 -3.46577566e-001 -7.95799575e-007 2.55926670e+000 1.25000000e-001

x =-0.346577566436640-0.000000795799575

f =2.559266696682093

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

26/76

En las siguientes grficas se puede ver el comportamiento delproceso iterativo con el clculo de la amplitud de paso mediantebacktracking y exactamente por el mtodo de la biseccin.

0 5 10 15 20 251015

1010

105

100

105

inexacta: backtracking

line search exacta: bisecc.

k

err

or

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

27/76

Utilizando la rutina fminunc de Matlab, se tendra una sesincomo la que sigue.>> x0=[-1;1];>> options = optimset(Display,iter,GradObj,on,LargeScale,off);>> [x,fval,exitflag,output] = fminunc(@objfun_min1,x0,options)

First-orderIteration Func-count f(x) Step-size optimality

0 1 9.16207 201 2 3.57914 0.0499801 2.52 3 3.31537 1 2.113 4 2.60267 1 0.5484 5 2.56573 1 0.3495 6 2.55954 1 0.06136 7 2.55928 1 0.0117 8 2.55927 1 0.0001448 9 2.55927 1 1.88e-007

Optimization terminated: relative infinity-norm of gradient less than options.TolFun.x =

-0.3466-0.0000

fval =2.5593

exitflag =1

output =iterations: 8

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

28/76

funcCount: 9stepsize: 1

firstorderopt: 1.8773e-007algorithm: medium-scale: Quasi-Newton line searchmessage: [1x85 char]

>>

La funcin objetivo y su gradiente se le ha pasado en la rutina oscript objfun_min1 que es como sigue.

function [f g]= objfun_min1(x)%% f(x) = sum(exp(A*x+b))A = [1 3; 1 -3; -1 0]; b = -0.1*[1; 1; 1];f = sum(exp(A*x+b));g = A*exp(A*x+b);

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

29/76

Mtodo de Newton

Consideremos ahora el desarrollo en serie de Taylor de f .x/hasta segundo orden:

f .x C p/ D f .x/Crf .x/Tp C 12pTr2f .x/p CO kpk3 ;

donde g D rf .x/ es el vector gradiente y

H D r2f .x/ D

26666666664

@2f .x/

@2x1

@2f .x/

@x1@x2 @

2f .x/

@x1@xn

@2f .x/

@x2@x1

@2f .x/

@2x2 @

2f .x/

@x2@xn:::

::: : : ::::

@2f .x/

@xn@x1

@2f .x/

@xn@x2 @

2f .x/

@2xn

37777777775la matriz Hessiana.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

30/76

La condicin necesaria de ptimo, rf .x/ D 0, considerandohasta segundas derivadas, conduce al sistema lineal

rf .x/Cr2f .x/p D g CHp D 0cuya solucin es la direccin de Newton hacia el ptimo.

Si la matriz H D r2f .x/ es definida positiva xTHx > 0 paracualquier x 0, la direccin de Newton es una direccin dedescenso pues

0 < pTHp D pTg;cumplindose el criterio pTg < 0 que hemos deducido antes.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

31/76

Algoritmo de Newton para minimizar f .x/:


Calcular direccin de descenso; resolver Hp D gif (p no existe) or (tol)

found WD trueelse

WD amplitud de paso .x;p/ con backtrackingx WD x C p

endk WD k C 1

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

32/76

El guin de Matlab que lo implementa este.

function [x f] = Newton_unc(fun,x)% Mtodo de Newton

rho = 0.1; beta = 0.5; % Parmetros de line searchmaxit = 100;for i=1:maxit % Proceso iterativo

[f g H] = fun(x);p = -H\g;if abs(p*g) < 1e-8, break, endalpha = 1;for k=1:10 % Backtracking de amplitud de paso

xnew = x+alpha*p;fxnew = fun(xnew);if fxnew < f+alpha*rho*g*p, breakelse alpha=alpha*beta;end

endx = x + alpha*p;fprintf(%4.0f %13.8e %13.8e %13.8e %13.8e\n,i,x,f,alpha);

end

function [f g H] = objfun_min2(x)A = [1 3; 1 -3; -1 0]; b = -0.1*[1; 1; 1];f = sum(exp(A*x+b)); if nargout

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

33/76

Para resolver otra vez f .x/ D ex1C3x20;1 C ex13x20;1 C ex10;1.

>> [x f]=Newton_unc(@objfun_min2,[-1;1])1 -5.23625188e-002 3.53998022e-001 9.16207023e+000 1.00000000e+0002 -1.05634526e-001 1.05820897e-001 3.73378771e+000 1.00000000e+0003 -3.18485379e-001 2.52139713e-002 2.71665315e+000 1.00000000e+0004 -3.45138214e-001 7.18724132e-004 2.56404324e+000 1.00000000e+0005 -3.46572427e-001 1.03191597e-006 2.55927231e+000 1.00000000e+000

x =-0.3465724270276440.000001031915967

f =2.559266696666079

>>

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

34/76

La convergencia del mtodo y el error vs iteracin se describen acontinuacin.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 51015

1010

105

100

105

k

err

or

La convergencia de este mtodo si la matriz Hessiana es definidapositiva es cuadrtica.

Funciona especialmente bien en las proximidades del ptimo oun punto estacionario.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

35/76

Combinacin de mxima pendiente y Newton

La direccin de Newton es siempre de descenso si la matrizHessiana es definida positiva.

Un algoritmo hbrido de mxima pendiente y Newton, parapuntos de arranque lejanos donde no se de la condicin anterior,podra mejorar las prestaciones del mtodo de Newton.

if H .x/ es definida positivap D pN

elsep D pmp

endx WD x C p

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

36/76

function [x f i] = Newton_mp(fun,x)% Mtodo hbrido Newton-mxima pendiente

global hrho = 0.1; beta = 0.5; % Parmetros de line searchmaxit = 100; h=sqrt(eps);for i=1:maxit % Proceso iterativo

[f g H] = fun(x);[R npd] = chol(H); % Cholesky comprueba si H es definida positivaif ~npd

p = -R\(R\g); % Direccin de Newton_mpelse

p = -g; % Direccin de mxima pendienteendif abs(p*g)

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

37/76

Si lo utilizamos para resolver la funcin de Rosenbrock partiendode un punto alejado.

>> [x f k] = Newton_mp(@objfun_min3,[0;15])1 1.56250e-002 -8.43750e+000 2.25010e+004 7.81250e-003 12 1.62080e-002 2.62358e-004 7.12052e+003 1.00000e+000 03 2.62139e-001 8.23454e-003 9.67847e-001 2.50000e-001 04 3.18480e-001 9.82550e-002 9.10251e-001 1.00000e+000 05 5.26915e-001 2.32607e-001 4.65478e-001 5.00000e-001 06 5.74193e-001 3.27462e-001 4.26601e-001 1.00000e+000 07 7.21323e-001 4.97541e-001 1.81811e-001 5.00000e-001 08 7.71508e-001 5.92706e-001 1.29485e-001 1.00000e+000 09 8.47484e-001 7.11197e-001 5.28430e-002 5.00000e-001 0

10 9.10865e-001 8.25658e-001 2.82056e-002 1.00000e+000 011 9.60290e-001 9.19713e-001 9.55884e-003 1.00000e+000 012 9.86967e-001 9.73391e-001 2.17363e-003 1.00000e+000 013 9.98376e-001 9.96624e-001 2.20516e-004 1.00000e+000 014 9.99959e-001 9.99915e-001 4.33199e-006 1.00000e+000 0x =

0.9999587782014350.999915052848790

f =2.326866095161381e-009

k =

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

38/76

Mtodos de Newton amortiguado y de reginde confianza

Supongamos que se tiene un modelo M del comportamiento dela funcin f en el entorno de un punto x tal que

f .x C p/ 'M.p/ f .x/C pT c C 12pTBp;

donde c 2 Rn y la matriz B 2 Rnn es simtrica.Un modelo as puede ser el desarrollo en serie de Taylor hastasegundas derivadas, o una aproximacin adecuada.

La idea es utilizar esos modelos para calcular direcciones dedescenso con amplitudes de paso D 1 en entornos de ese modelode confianza y modulables.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

39/76

El paso de Newton amortiguado se calcula como

pNa mKnp

M.p/C 12pTp;

donde es el parmetro de amortiguacin. El trmino12pTp D 1

2kpk2 penaliza pasos largos.

El paso de regin de confianza, como

prc mKnkpk M.p/;supuesto que se conoce un nmero tal que el modelorepresenta suficientemente bien a la funcin dentro de la esferade radio .

Si la funcin decrece convenientemente en cualquiera de estasdirecciones, se hace x WD x C p y se adapta o . Si no decrece,al menos que x D x, hay que modificar o para que en laprxima iteracin haya ms suerte y se mejore la funcin.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

40/76

La figura siguiente ilustra la idea.

C H A P T E R 4 . T R U S T - R E G I O N M E T H O D S 67

minimizer. In general, the direction of the step changes whenever the size of the trust regionis altered.

The size of the trust region is critical to the effectiveness of each step. If the region istoo small, the algorithm misses an opportunity to take a substantial step that will move itmuch closer to the minimizer of the objective function. If too large, the minimizer of themodelmay be far from theminimizer of the objective function in the region, so wemay haveto reduce the size of the region and try again. In practical algorithms, we choose the size ofthe region according to the performance of the algorithm during previous iterations. If themodel is consistently reliable, producing good steps and accurately predicting the behaviorof the objective function along these steps, the size of the trust region may be increased toallow longer, more ambitious, steps to be taken. A failed step is an indication that our modelis an inadequate representation of the objective function over the current trust region. Aftersuch a step, we reduce the size of the region and try again.

Figure 4.1 illustrates the trust-region approach on a function f of two variables inwhich the current point xk and the minimizer x lie at opposite ends of a curved valley.The quadratic model function mk , whose elliptical contours are shown as dashed lines, isconstructed from function andderivative information at xk andpossibly also on informationaccumulated from previous iterations and steps. A line search method based on this modelsearches along the step to the minimizer of mk (shown), but this direction will yield at mosta small reduction in f , even if the optimal steplength is used. The trust-region methodsteps to the minimizer of mk within the dotted circle (shown), yielding a more signicantreduction in f and better progress toward the solution.

In this chapter, we will assume that the model function mk that is used at eachiterate xk is quadratic. Moreover, mk is based on the Taylor-series expansion of f around

kcontours of

contours of fTrust region step

Trust region

m

Line search direction

Figure 4.1 Trust-region and line search steps.

Regin de confianzaPaso de Newton

Contornos del modelo m.p/

Contornos de f .x/Paso de regin deconfianza

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

41/76

La calidad del modelo elegido se evala mediante la ganancia

% D f .x/ f .x C p/M.0/ M.p/ ;

es decir, la relacin entre lo que decrece la funcin desde x y loque prevea el modelo que lo hara.

Con el modelo regin de confianza, si % < 14la ganancia es

pequea y se deberan reducir los pasos, por ejemplo a la mitad,mientras que si % > 3

4se podran aumentar.

if % < 0;25 WD =2

elseif % > 0;75 WD mKax f; 3 kpkg

end

Es importante que los factores 2 y 3 de esta estrategia no haganoscilar demasiado .

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

42/76

Si con el modelo Newton amortiguado % es pequeo, se deberaaumentar el parmetro de amortiguacin, , aumentando as lapenalizacin por dar pasos grandes. Un % grande, por elcontrario, indicar que M.p/ es una buena aproximacin def .x C p/ y se puede reducir .

if % < 0;25 WD 2

elseif % > 0;75 WD =3

end

Otra estrategia para bastante usada es:

if % > 0 WD mKax 1

3; 1 .2% 1/3

else WD 2

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

43/76

Newton amortiguado. Clculo de la direccin

Se calcula determinando

.p/ D mKnp

M.p/C 12pTp:

La condicin necesaria de mnimo, r .p/ D 0, hace que ladireccin pNa sea la solucin de

0.h/ DM 0.h/C p D 0:lo que es equivalente, de acuerdo con la definicin de M.p/ a

.B C I/pNa D c:Si es suficientemente grande, la matriz simtrica B C I esdefinida positiva, por lo que pNa es un mnimo del modelo M.p/.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

44/76

En Newton amortiguado B D H y c D g, y el sistema es

.H C I/pNa D g: Si es muy grande,

pNa 1

g;

por lo que la direccin es prxima a la de mxima pendiente. Si es muy pequeo, la direccin es casi la de Newton.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

45/76

Algoritmo de Newton amortiguado para minimizar f .x/:

Dados Un x WD x0 y tol. Hacer D 1; D 2 y found WD falsewhile (not found) and (k < kmax)

Resolver direccin de descenso .H C I/p D gif (p no existe) or (tol)

found WD trueelse

Calcular % D .f .x/ f .x C p// = .M.0/ M.p//if % > 0 WD mKax 1

3; 1 .2% 1/3

x WD x C pelse WD 2

endendk WD k C 1

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

46/76

function [x f k] = Newton_amortiguado(fun,x)% Mtodo de Newton amortiguado con paso completo

k = 1; kmax = 500; eps1=1e-8; eps2=1e-12; n=length(x);[f g H] = fun(x); ng = norm(g,inf); mu = 1; found = ng

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

47/76

En Matlab para resolver la funcin

f .x/ D 0;5x21.x21=6C 1/ x2 arctan.x2/ 0;5 ln.x22 C 1/

>> [x f k] =Newton_amortiguado(@fun5_2,[2;2])2 1.22222222e+000 1.07737607e+000 1.43392805e+000 3.33333333e-0013 5.74640142e-001 4.41028668e-002 1.75164488e-001 1.11111111e-0014 1.32066691e-001 4.36656594e-003 8.75568971e-003 3.70370370e-0025 6.09495146e-003 1.55898129e-004 1.85864837e-005 1.23456790e-0026 7.44750479e-005 1.90119424e-006 2.77507370e-009 4.11522634e-0037 3.05225879e-007 7.79177968e-009 4.66421303e-014 1.37174211e-0038 4.18117642e-010 1.06736708e-011 8.75251085e-020 4.57247371e-004

x =1.0e-009 *0.4181176418942180.010673670797581

f =8.752510847988606e-020

k =8

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

48/76

Regin de confianza. Clculo de la direccin

Teorema. El vector p es la solucin de

prc mKnkpk M.p/ f .x/C gTp C 1

2pTBp

si y slo si p es factible y existe un escalar 0 tal que se cumplenlas siguientes condiciones

.B C I/p D g;. kpk/ D 0 y

.B C I/ es semidefinida positiva.B es la matriz Hessiana, r2f .x/, o una aproximacin.

La segunda condicin de complementariedad establece que o D 0 o kpk D .

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

49/76

70 C H A P T E R 4 . T R U S T - R E G I O N M E T H O D S

Theorem 4.1.The vector p is a global solution of the trust-region problem

minpIRn

m(p) f + gT p + 12 pT Bp, s.t. p , (4.7)

if and only if p is feasible and there is a scalar 0 such that the following conditions aresatised:

(B + I )p g, (4.8a)( ||p||) 0, (4.8b)

(B + I ) is positive semidenite. (4.8c)

We delay the proof of this result until Section 4.3, and instead discuss just its keyfeatures herewith thehelpof Figure 4.2.The condition (4.8b) is a complementarity conditionthat states that at least one of the nonnegative quantities and ( p) must be zero.Hence, when the solution lies strictly inside the trust region (as it does when 1 inFigure 4.2), wemust have 0 and so Bp g with B positive semidenite, from (4.8a)and (4.8c), respectively. In the other cases 2 and 3, we have p , andso is allowed to take a positive value. Note from (4.8a) that

p Bp g m(p).

m

1

contours of

*3p

2

3

p *2p*1

Figure 4.2 Solution of trust-region subproblem for different radii 1, 2, 3.

Contornos del modelo M.p/

Indica que cuando p est estrictamente dentro de la regin deconfianza (como es el caso en la figura de D 1), se debecumplir que D 0 y por tanto que Bp D g, siendo Bsemidefinida positiva.

Cuando D 2 y D 3, se tiene que kpk D y ser > 0.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

50/76

De la primera condicin se deduce que

p D Bp g D rM.p/:Es decir, cuando > 0, la solucin p es colineal con el negativodel gradiente (mxima pendiente) de M y perpendicular a susperfiles de contorno, como en la figura.

La solucin del subproblema M.p/ no tiene que ser exacta.Formas de aproximar la solucin:

Punto de Cauchy : Mnimo a lo largo de p D g, acotada a Dogleg (pata de perro): si B es definida positiva

Minimizacin en un subespacio de dimensin 2 : Si B es indefinida

Steihaug : Si B es simtrica

Otras: : :

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

51/76

Punto de Cauchy

Es la solucin que minimiza el modelo lineal de Mk.p/ en unaiteracin k. Concretamente

pck D k

kgkkgk;

donde

k D1 si gTkBgk 0mKn kgkk3= kgTkBkgk ; 1 si gTkBgk > 0:


It is easy to write down a closed-form denition of the Cauchy point. For a start, thesolution of (4.9) is simply

pSk k

gkgk .

To obtain k explicitly, we consider the cases of gTk Bkgk 0 and gTk Bkgk > 0 separately. Forthe former case, the function mk( pSk) decreases monotonically with whenever gk 0,so k is simply the largest value that satises the trust-region bound, namely, k 1. Forthe case gTk Bkgk > 0, mk( p

Sk) is a convex quadratic in , so k is either the unconstrained

minimizer of this quadratic, gk3/(kgTk Bkgk), or the boundary value 1, whichever comesrst. In summary, we have

pCk kk

gkgk, (4.11)

where

k {

1 if gTk Bkgk 0;min

(gk3/(kgTk Bkgk), 1) otherwise. (4.12)

Figure 4.3 illustrates the Cauchy point for a subproblem in which Bk is positivedenite. In this example, pCk lies strictly inside the trust region.

The Cauchy step pCk is inexpensive to calculateno matrix factorizations arerequiredand is of crucial importance in deciding if an approximate solution of thetrust-region subproblem is acceptable. Specically, a trust-region method will be globally

kC

mk

gk

Trust region

contours of

p

Figure 4.3 The Cauchy point.

Contornos del modelo Mk.p/

Regin de confianza

gk

pck

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

52/76

DogLeg

Hemos visto que si la matriz B del modelo M.p/ es definidapositiva, su mnimo es pB D B1g. Es decir p./ D pB, cuando kpBk.

Si es grande con respecto a pB, la condicin kpk hace queel trmino cuadrtico del modelo M.p/ tenga poco efecto en susolucin. En este caso, se puede aproximar

p./ gkgk:

Para valores intermedios la solucin p./ sigue una trayectoriacurvilnea como la figura que sigue.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

53/76


)

pB full step( )g)pU

g

Trust region

pOptimal trajectory

dogleg path

unconstrained min along(

(

Figure 4.4 Exact trajectory and dogleg approximation.

by simply omitting the quadratic term from (4.5) and writing

p() gg , when is small. (4.14)

For intermediate values of , the solution p() typically follows a curved trajectory likethe one in Figure 4.4.

The dogleg method nds an approximate solution by replacing the curved trajectoryfor p() with a path consisting of two line segments. The rst line segment runs from theorigin to the minimizer of m along the steepest descent direction, which is

pU gT g

gT Bgg, (4.15)

while the second line segment runs from pU to pB (see Figure 4.4). Formally, we denote thistrajectory by p( ) for [0, 2], where

p( ) {

pU, 0 1,pU + ( 1)(pB pU), 1 2. (4.16)

The dogleg method chooses p to minimize the model m along this path, subject tothe trust-region bound. The following lemma shows that the minimum along the doglegpath can be found easily.

Trayectoria ptima de p./

Regin de confianza

gpB (paso completo)

pU : mnimo en la direccin g

direccin DogLeg

El mtodo DogLeg reemplaza la trayectoria curvilnea por dossegmentos de lnea recta. El primero hasta el mnimo de M .p/en la direccin de mxima pendiente

pU D gTg

gTBgg:

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

54/76

El segundo desde pU hasta pB. En conjunto a esta trayectoria sele denomina Qp./, con 2 0; 2, donde

Qp./ DpU 0 1pU C . 1/ pB pU 1 2:

La interseccin de esta trayectoria con la regin de confianza esel punto que se busca; concretamente el que resuelve laecuacin

pU C . 1/ pB pU

2 D 2:

Si la matriz B no es definida positiva, es mejor explorar otrosmtodos.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

55/76

Algoritmo de minimizacin de una funcin mediante el mtodode regin de confianza DogLeg con modelo de Newton.

Dados Un x WD x0 y tol. Hacer D 1 y found WD falsewhile (not found) and (k < kmax)

Resolver HpN D gif pN , x WD x C pNelse pmp D .gTg=gTHg/g

if pmp > , x WD x C pcelseCalcular dir. DogLeg e interseccin x WD x C pmp C .pN pmp/

endendCalcular % D .f .x/ f .x C p// = .M.0/ M.p//if % < 0;25, WD =2elseif % > 0;75, WD mKaxf 2;maxgendk WD k C 1, if found , exit

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

56/76

function [xc fc] = Dogleg_UBC_yo(fun,x0)% dogleg.m -- Mtodo de Regin de Confianza con modelo Newton

eps1=100*sqrt(eps); eps2=sqrt(eps); Kmax=200; Delta=0.5; Dmax=100;

xstring = ;for jj=1:length(x0)

xstring = [xstring,x_k(,int2str(jj),) ];endfprintf([\n k %s f(x_k) Delta |dx|/Del ro ,...

e_1 e_2\n],xstring);xc = x0; [fc gc Hc] = fun(xc);disp([ 0, sprintf( %12.4e,[xc,fc]),sprintf( %10.2e,Delta)]);for k=1:Kmax % Modelo Newton: -(gc*pN + 0.5*pN*Hc*pN);

pN = - Hc\gc; % Direccin de NewtonpNlen = (pN*pN)^0.5; gHg = gc*Hc*gc;if pNlen =Delta % Newton y mp fuera regin: usa punto Cauchy

xt = xc - Delta*gc/(gc*gc)^0.5;else % mp dentro regin y Newton fuera: usa DogLeg

% en lmite regin de linea uniendolospN_pSD = pN-pSD;a = pN_pSD*pN_pSD; b = 2*pN_pSD*pSD; c = pSD*pSD - Delta^2;t = (-b+(b^2-4*a*c)^0.5)/2/a;xt = xc + pSD + t*pN_pSD;

endend[fn gn Hn] = fun(xt); % Nuevo punto obtenido en Reg. Con.dx = xt - xc; df = fn - fc;redfun = -df; % Reduccin obtenida en funcinrepred = -gc*dx-0.5*dx*Hc*dx; % Reduccin predicha modelo Newton

rho = redfun/repred; % Gananciae1 = max(abs(pN)./max([abs(xt);eps2*ones(size(xt))]));e2 = max((abs(gn).*abs(xt))/ max([abs(fn),eps2]));if e1((1-eps)*Delta),

Delta = min([2*Delta;Dmax]);enddisp([sprintf(%3d,k), sprintf( %12.4e,[xn,fn]),...sprintf( %10.2e,Delta),sprintf( %6.4f,norm(dx)/Delta),...sprintf( %7.3f,rho),sprintf( %8.1e,e1),...sprintf( %8.1e,e2)]);if e1

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

57/76

En Matlab para resolver la funcin

f .x/ D 0;5x21.x21=6C 1/ x2 arctan.x2/ 0;5 ln.x22 C 1/

>> [x f]=Dogleg_UBC_yo(@fun5_2,[2;2]);

k x_k(1) x_k(2) f(x_k) Delta |dx|/Del ro e_1 e_20 2.0000e+000 2.0000e+000 4.7429e+000 5.00e-0011 1.5135e+000 1.8846e+000 2.8658e+000 1.00e+000 0.5000 1.040 2.8e+000 1.4e+0002 5.8687e-001 1.5086e+000 1.0754e+000 2.00e+000 0.5000 1.233 2.6e+000 1.4e+0003 -4.1104e-001 -2.2464e-001 1.1188e-001 4.00e+000 0.5000 0.781 2.1e+000 1.6e+0004 -3.9606e-002 7.4824e-003 8.1253e-004 4.00e+000 0.1095 1.045 1.0e+000 1.9e+0005 -4.1354e-005 -2.7927e-007 8.5513e-010 4.00e+000 0.0101 1.001 1.0e+000 1.1e-0016 -4.7149e-014 1.4505e-020 1.1115e-027 4.00e+000 0.0000 1.000 1.0e+000 1.5e-019

ptimo: x = -4.7148815071357958e-014 1.4505439221729893e-020.Funcin: f(x) = 1.1115053813167661e-027.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

58/76

Mtodo de los gradientes conjugados

La idea es extender el mtodo que vimos para minimizarfunciones convexas cuadrticas a no lineales en general. Fuepropuesta por R. Fletcher y C. Reeves en 1964.

La direccin de descenso de este mtodo es

p D rf .x/C pprLa nueva direccin p y la previa, ppr , han de ser conjugadas conrespecto a la matriz Hessiana de la funcin en cada punto delproceso iterativo.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

59/76

El parmetro se determina de tal manera que minimice unaforma cuadrtica que aproxime la funcin en un punto delproceso iterativo, pues suficientemente cerca del ptimo lasfunciones continuas se aproximan a formas cuadrticas.

La primera frmula del mismo fue sugerida por Fletcher yReeves:

D rf .x/Trf .x/

rf .xpr/Trf .xpr/

La de Polak y Ribire, posteriormente, en 1971:

Drf .x/ rf .xpr/T rf .x/rf .xpr/Trf .xpr/

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

60/76

Algoritmo de gradientes conjugados para minimizar f .x/:

Dados La funcin f .x/, un punto de partida x0 y una tolerancia.Calcular ppr D rf .x0/

Repetir Mientras la aproximacin a la solucin > tolerancia:1. Calcular dir. de descenso: p D rf .x/C ppr ,

con D rf .x/Trf .x/

rf .xpr/Trf .xpr/o D

rf .x/ rf .xpr/T rf .x/rf .xpr/Trf .xpr/ .

2. Calcular la amplitud de paso, , en esa direccin.3. Calcular el nuevo punto x WD x C p.

Ninguna de las dos frmulas de requiere calcular la matrizHessiana.Si la funcin a minimizar es cuadrtica coinciden.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

61/76

Vamos a resolver con Matlab la funcin de Rosenbrock:

f .x/ D 100 x2 x212C .1 x1/2:

0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.50.5

0.4

0.3

0.2

0.1

0

0.1

0.2

0.3

0.4

0.5

x1

x 2

1

1

1

1

2

22

2

2 2

2

3

33

3

3

3

3

3

4

4 4

4

4

44

4

5

55

5

5

5 5

5

10

10

10

10

1010

10

15

15

15

15

1515

15

Figure 5: Contour plot of Rosenbrocks banana function.

where K is the cone of 6 6 PSD matrices. Following the terminology introduced in[7, 8], the above matrix is referred to as the moment, or measure matrix associated withthe LMI relaxation. Because the above moment matrix contains relaxations of monomialsof degree up to 2+2=4, it is referred to as the second-degree moment matrix. The aboveupper-left 3x3 submatrix contains relaxations of monomials of degree up to 1+1=2, so itis referred to as the first-degree moment matrix.

Now replacing the monomials in the criterion by their relaxed variables, the first LMIrelaxation of Rosenbrocks banana function minimization reads

max 1 + 2y10 y20 100y02 + 200y21 100y40

s.t.

1 y10 y01 y20 y11 y02y10 y20 y11 y30 y21 y12y01 y11 y02 y21 y12 y03y20 y30 y21 y40 y31 y22y11 y21 y12 y31 y22 y13y02 y12 y03 y22 y13 y04

K.

For a comprehensive description of the way LMI relaxations are build (relaxations ofhigher orders, moment matrices of higher degrees and moment matrices associated withconstraints), the interested reader is advised to consult [7, 8]. All we need to know hereis that an LMI relaxation of a non-convex optimization problem can be expressed as a

17

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

62/76

function [x f] = Grad_Conjugados_unc(fun,x,par)% Mtodo de los gradientes conjugados para minimizar f(x)

rho = 0.01; beta = 0.1; % Parmetros de line searchmaxit = 1000;[f g]= fun(x); ppr=-g; gpr=g; pbeta=0; % Comienzo con mx. pendientefor i=1:maxit % Proceso iterativo

if i>1[f g] = fun(x);if par==1, pbeta=(g*g)/(gpr*gpr); % Fletcher-Reeveselse pbeta=((g-gpr)*g)/(gpr*gpr); % Polak-Rivireend

endp = -g+pbeta*ppr;if (norm(g,inf) < 1e-6), break, end % Condicin de mnimoalpha = 1;for k=1:10 % Backtracking amplitud de paso

xnew = x+alpha*p; fxnew = fun(xnew);if fxnew

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

63/76

Se parte del punto x D 1;2; 1. El ptimo es x D 1; 1. Lafuncin y el camino hacia el ptimo son los de la figura.

35 4. CONJUGATE GRADIENT METHODS

4.5. Convergence PropertiesIn Theorem 4.8 we saw that the search directions hcg of a conjugate gradi-ent method are descent directions and thus the of (2.12) satisfies hcg c f (x)22 andlimk

f (x)2 = 0 .

Proof. See Al-Baali (1985).Let us finally remark on the rate of convergence. Crowder and Wolfe (1972)show that conjugate gradient methods with exact line search have a linearconvergence rate, as defined in (2.4). This should be contrasted with thesuperlinear convergence rate that holds for Quasi-Newton methods and thequadratic convergence rate that Newtons method possesses.

Example 4.3. Rosenbrocks function,f(x) = 100(x2 x21)2 + (1 x1)2 ,

is widely used for testing optimization algorithms. Figure 4.2 shows level curvesfor this function (and illustrates, why it is sometimes called the banana func-tion).

4.5. Convergence Properties 36

1.5 1.5

0.5

2

x1

x2

300

100

30103

10.3

Figure 4.2: Contours of Rosenbrocks function.The function has one minimizer x = [1, 1]> (marked by a + in the figure) withf(x) = 0, and there is a valley with sloping bottom following the parabolax2 = x

21. Most optimization algorithms will try to follow this valley. Thus,

a considerable amount of iteration steps is needed, if we take x0 in the 2ndquadrant.Below we give the number of iteration steps and evaluations of f(x) and f (x)when applying Algorithm 4.6 on this function. In all cases we use the startingpoint x0 = [1.2, 1 ]>, and stopping criteria given by 1 = 108, 2 = 1012in (4.7). In case of exact line search we use = 106, = 106 in (2.29),while we take = 101, % = 102 in Algorithm 2.27 for soft line search.

Method Line search # it. steps # fct. evalsFletcherReeves exact 118 1429FletcherReeves soft 249 628PolakRibie`re exact 24 266PolakRibie`re soft 45 130

Thus, in this case the PolakRibie`re method with soft line search performsbest. Below we give the iterates (cf. Figure 4.2) and the values of f(xk) andf (xk); note the logarithmic ordinate axis.

37 4. CONJUGATE GRADIENT METHODS

1.2 1

1

x1

x2

0 10 20 30 40 501e15

1e10

1e5

1

f||f||

Figure 4.3: PolakRibie`re method with soft line searchapplied to Rosenbrocks function.Top: iterates xk. Bottom: f(xk) and f (xk).

4.6. ImplementationTo implement a conjugate gradient algorithm in a computer program, somedecisions must be made. Of course we need to choose a formula for ; werecommend the PolakRibie`re formula.We also need to specify the exactness of the line search. For Newton-typemethods it is usually recommended that the line search be quite liberal, sofor the line search in Algorithm 2.27 it is common to choose the parame-ter values %= 0.01 and = 0.9. For conjugate gradient methods experiencedictates that a line search with stricter tolerances be used, say %= 0.01 and= 0.1. In addition we have to specify the stopping criterion; (2.9) is rec-ommended. Since we do not have access to f (xk), we cannot use (2.10).For methods with a fast convergence rate, (2.8) may be quite satisfactory, but

4.7. CG Method for Linear Systems 38

its use for conjugate gradient methods must be discouraged because their fi-nal convergence rate is only linear.Finally some remarks on the storage of vectors. The FletcherReevesmethod may be implemented using three n-vectors of storage, x, g and h.If these contain x, f (x) and hprev at the beginning of the current iterationstep, we may overwrite h with hcg and during the line search we overwritex with x+hcg and g with f (x+hcg). Before overwriting the gradient,we find f (x)>f (x) for use in the denominator in (4.11) on the next iter-ation. For the PolakRibie`re method we need acces to f (x) and f (xprev)simultaneously, and thus four vectors are required, say x, g, gnew and h.

4.7. The CG Method for Linear SystemsWe cannot part with conjugate gradient methods without mentioning thatthey can of course be used to minimize the quadratic function (4.2) itself.But by (4.3) this is equivalent to solving the positive definite linear system

Hx = b .Let g denote the current gradient,

g = q(x) = Hx + b ,

and let u = H hcg. It is easily verified that the exact step length may becalculated directly,

=u>gu>hcg

,

and that x and g are updated byx := x + hcg, g := g + u .

The FletcherReeves and the PolakRibie`re formulas are equivalent in thissetting,

=g>g

g>prevgprev.

Thus, the method can be implemented using four n-vectors, x, g, h, u.

Con la tolerancia y los parmetros del listado, la convergencia esla de la tabla.

Mtodo Nmero iteraciones

PolakRibire 103

FletcherReeves 514

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

64/76

Probad

>> [x f] = Grad_Conjugados_unc(@objfun_min3,[-1.2;1],1)1 -9.84400000e-001 1.08800000e+000 2.42000000e+001 1.00000000e-0032 -1.01771362e+000 1.06810987e+000 5.35291158e+000 1.00000000e-003

513 1.00000141e+000 1.00000283e+000 2.00451765e-012 1.00000000e-004x =

1.0000014147806741.000002834445620

f =2.003988012156511e-012

>> [x f] = Grad_Conjugados_unc(@objfun_min3,[-1.2;1],2)1 -9.84400000e-001 1.08800000e+000 2.42000000e+001 1.00000000e-0032 -9.84399999e-001 1.08800000e+000 5.35291158e+000 1.00000000e-010

102 1.00000121e+000 1.00000243e+000 1.47631065e-012 1.00000000e-003x =

1.0000012126798991.000002430217752

f =1.472951080761591e-012

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

65/76

Mtodos cuasi Newton

Siguen la estrategia del mtodo de Newton pero utilizandofrmulas de recurrencia para obtener la matriz hessiana de lafuncin o la inversa de esta.

Si la direccin de descenso de Newton era

p D r2f .x/1rf .x/se utilizara

p D Hrf .x/;donde H es un aproximacin de la matriz inversa de la hessiana.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

66/76

Las frmulas ms usadas para obtener H son:

DFP Debida a Davidon, Fletcher y Powell:

H kC1 D H k H kykyTkH k

yTkH kykC sks

Tk

yTk sk:

BFGS Debida a Broyden, Fletcher, Goldfarb yShanno:

H kC1 D H k C1C y

TkH kyk

yTk sk

sksTkyTksksky

TkH k CH kyksTk

yTk sk

:

donde yk D rf .xkC1/ rf .xk/ y sk D xkC1 xk D kpk

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

67/76

Algoritmo cuasi Newton para minimizar f .x/.

Dados f .x/, un x0, una tolerancia y H 0

Repetir Mientras no se satisfaga la tolerancia de solucin:1. Determinar la direccin de descenso:

Obtener pk D H krf .xk/.2. Calcular la amplitud de paso, k, en esa direccin.

3. Calcular el nuevo punto xkC1 WD xk C kpk.4. Calcular H kC1.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

68/76

En Matlab:

function [x i f nfeval] = quasi_newton_1(fun,x,metodo)% Mtodo cuasi Newton

rho = 0.01; beta = 0.1; % Parmetros de line search[f g] = fun(x); H=eye(length(x)); eps1=1e-5; eps2=1e-8;

maxit = 1000; nfeval = 1;for i=1:maxit

xp = x; gp = g;p = -H*g; nh = norm(p); % Nueva direccin: cuasi Newtonif norm(g,inf)

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

69/76

Para obtener la solucin, se tendra una sesin como la que sigue.

>> [x i f nf]=quasi_newton_1(@objfun_min3,[-1.2;1],1)x =

0.9999866008195810.999973154300098

i =247

f =1.797638376605058e-010

nf =506

>>

>> [x i f nf]=quasi_newton_1(@objfun_min3,[-1.2;1],2)x =

0.9999998731700810.999999758419957

i =40

f =3.067793488101831e-014

nf =88

>>

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

70/76

La funcin objetivo y su gradiente se le ha pasado en la rutinaobjfun_min3 que ya hemos utilizado.

La sesin con el guin de Matlab fminunc sera la siguiente.>> options = optimset(Display,iter,GradObj,on,LargeScale,off...,HessUpdate,bfgs);>> x0=[-1.2;1];>> [x,fval,exitflag,output] = fminunc(@objfun_min3,x0,options)


0 1 24.2 2161 3 4.28049 0.000861873 15.22 4 4.12869 1 33 5 4.12069 1 1.54 6 4.1173 1 1.625 7 4.08429 1 5.726 8 4.02491 1 10.47 9 3.9034 1 17.48 10 3.7588 1 20.19 11 3.41694 1 19.910 12 2.88624 1 11.911 13 2.4428 1 9.7812 14 1.93707 1 3.0113 17 1.64357 0.141994 5.5414 18 1.52561 1 7.5715 19 1.17013 1 4.53

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

71/76

16 20 0.940886 1 3.1717 21 0.719844 1 5.1518 22 0.409164 1 5.7319 25 0.259747 0.0842377 5.01


20 26 0.238901 1 1.0621 27 0.2102 1 1.2222 29 0.18249 0.584225 3.3323 30 0.15856 1 2.9124 31 0.0893558 1 0.75625 33 0.0726393 0.438248 3.1826 34 0.0413887 1 2.327 35 0.0221877 1 0.49128 37 0.0126281 0.405833 1.9829 38 0.00703352 1 1.3530 39 0.00203299 1 0.19431 41 0.00109124 0.5 0.87432 42 8.941e-005 1 0.1433 43 7.16329e-006 1 0.080434 44 4.44047e-007 1 0.022235 45 1.49386e-008 1 0.0026336 46 9.03834e-013 1 1.65e-005

Local minimum found.

Optimization completed because the size of the gradient is less than

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

72/76

the default value of the function tolerance.

x =0.9999991240198150.999998284985335

fval =9.038341201889023e-013

exitflag =1

output =iterations: 36funcCount: 46stepsize: 1

firstorderopt: 1.652992217925215e-005algorithm: medium-scale: Quasi-Newton line searchmessage: [1x438 char]

>>

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

73/76

Con el mtodo cuasi Newton:

>> [x i f nf]=quasi_newton_1(@fun5_2,[2;2],2)x =

1.0e-005 *-0.4666333930777670.043763046313533

i =10

f =1.098304435204189e-011

nf =20

>>

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

74/76

Con el mtodo de los gradientes conjugados:

>> [x f] = Grad_Conjugados_unc(@fun5_2,[2;2],2)1 1.53333333e+000 1.88928513e+000 4.74291181e+000 1.00000000e-0012 -1.24817638e-001 1.06078252e+000 2.92446293e+000 1.00000000e+0003 -2.60972425e-002 2.32534554e-001 4.95278202e-001 1.00000000e+0004 -1.98278513e-002 1.70462426e-001 2.71382019e-002 1.00000000e+0005 -1.20583962e-003 1.35873959e-002 1.46557482e-002 1.00000000e+0006 -1.37365260e-003 1.15727268e-002 9.30328488e-005 1.00000000e+0007 2.08186271e-005 2.50443071e-004 6.79059687e-005 1.00000000e+0008 -2.88173500e-005 2.33979887e-004 3.15775731e-008 1.00000000e+0009 1.90328536e-006 6.31282977e-007 2.77885134e-008 1.00000000e+000

10 -4.91063765e-008 3.73003496e-007 2.01048075e-012 1.00000000e+000x =

1.0e-006 *-0.0491063764567620.373003496268343

f =7.072634268876572e-014

>>

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

75/76

Con el mtodo de Newton:

>> [x f]=Newton_unc(@fun5_2,[2;2])1 1.53333333e+000 -7.67871794e-001 4.74291181e+000 5.00000000e-0012 7.17182434e-001 2.73081655e-001 1.90728360e+000 1.00000000e+0003 1.62394466e-001 -1.33801795e-002 3.16058360e-001 1.00000000e+0004 2.78174708e-003 1.59690475e-006 1.33334500e-002 1.00000000e+0005 1.43502115e-008 -2.71495235e-018 3.86906467e-006 1.00000000e+000

x =1.0e-007 *0.143502114982597

-0.000000000027150f =

1.029642850223918e-016>>

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

76/76

Con el mtodo de la mxima pendiente:

>> [x f]=Maxima_pendiente_unc(@fun5_2,[2;2])1 -3.33333333e-001 1.44642564e+000 4.74291181e+000 5.00000000e-0012 1.23456790e-002 4.80532665e-001 8.89243872e-001 1.00000000e+0003 -6.27225474e-007 3.25798596e-002 1.11454631e-001 1.00000000e+0004 8.22680750e-020 1.15199317e-005 5.30629777e-004 1.00000000e+0005 0.00000000e+000 5.09598738e-016 6.63544598e-011 1.00000000e+000

x =1.0e-015 *

00.509598737990710

f =2.596908737617245e-031

>>

clase minimi sincond 11

Documents