For the sake of simplicity, we start by considering the equality constrained case:
which can be dealt with by the classical Lagrange multiplier method.
THEOREM 12.10 Assume that the functions f and hj in problem (12.60) meet some differentiability requirements, that the point x* is feasible, and that the constraints satisfy a suitable regularity property in x*. Then, a necessary condition for local optimality of x* is that there exist numbers , j = 1, …, m, called Lagrange multipliers, such that
The reader has undoubtedly noticed that we have been very loose in stating the conditions of the theorem. In fact, our aim is just to appreciate the concept of Lagrange multiplier and its economical interpretation, and we can do it without getting bogged down in too many technicalities. However, it is important to realize that the theorem is somewhat weak. It holds under technical conditions,28 which we do not describe in detail; furthermore, it is only a necessary (hence, not sufficient) condition for local (hence, not global) optimality. The good news is that it can be shown that the condition of the theorem is necessary and sufficient for a convex optimization problem, assuming differentiability of the involved functions.
To interpret the condition above, we may observe that it generalizes the stationarity condition; the trick is requiring stationarity not for the objective function, but for the following Lagrangian function:
In practice, the “recipe” requires us to augment the objective function by the constraints, which are multiplied by the Lagrange multipliers, and to enforce stationarity both with respect to the decision variables x:
and with respect to the multipliers, which actually boils down to the original equality constraints:
The mechanism can be best clarified by an example, but it is important to note that the conditions above yield a sensible system of equations. We have n decision variables and m equality constraints (m < n); Eqs. (12.62) and (12.63) yield a system of n + m (possibly) nonlinear equations to find the n values and the m multipliers .
Example 12.15 Consider the quadratic programming problem:
Since this quadratic form is convex, we may use Theorem 12.10 to find the global optimum. We associate the constraint with a multiplier λ, and form the Lagrangian function:
The stationarity conditions,
are just a system of linear equations, whose solution yields and λ* = −4. We may notice that the equality constraint can also be written as 4 − x1 − x2 = 0; if we do so, we have only a change in the sign of the multiplier, which is inconsequential.
Proving Theorem 12.10 rigorously would call for some additional concepts beyond the scope but we may at least get a better feeling for it by trying a justification based on Taylor’s expansions. How can we characterize the optimal solution x*? It should be a point such that there is no way of improving it, without violating constraints. If we want to improve on x*, we should look for a displacement , such that
This condition characterizes as a descent direction. Of course, the new solution must be feasible, i.e., for each equality constraint we must have . If we apply Taylor’s expansion to constraints, we obtain
where is a shorthand for the gradient at the optimal point, . But since x* is feasible, this boils down to
This condition characterizes feasible directions and states that the displacement must be orthogonal to the gradient of the constraints. If we also apply Taylor’s expansion to the objective function, we may rewrite (12.66) as
The two conditions (12.67) and (12.68) are not compatible if the gradient of the objective at x*, ∇f*, is a linear combination of gradients of the constraints:
In fact, if we take the inner product of both sides of Eq. (12.69) with , we find:
If Eq. (12.68) holds, then cannot be a descent direction. But Eq. (12.69) is exactly the requirement of Theorem 12.10. It is important to stress that this is no rigorous proof at all, but only a heuristic justification. In fact, it may give the impression that the stationarity of the Lagangian is a sufficient condition for local optimality, whereas it is actually a necessary condition, assuming regularity of constraints. A correct proof requires the implicit function theorem, which is beyond the scope of this book. However, this justification does provide us with some value, as shown in the next example.
Fig. 12.16 A quadratic programming example: geometric interpretation of Lagrange conditions.
Example 12.16 We may get an intuitive feeling for the Lagrange conditions by taking a look at Fig. 12.16, where we see the level curves of the objective function (12.64) of Example 12.15, a set of concentric circles, and the feasible region corresponding to Eq. (12.65), a line. From a geometric perspective, the problem calls for finding the closest point to the origin on the line x1 + x2 = 4. We note that the optimizer is where this line is tangent to the level curve associated with the lowest value of the objective. From an analytical viewpoint, the gradient of the objective function is
This gradient, changed in sign, is a vector pointing toward the origin, which is the steepest-descent direction for the objective. At point x* = (2, 2) the gradient is [4, 4]T. The gradient of the constraint h(x) = x1 + x2 − 4 is
Note that this vector is orthogonal to the feasible region and is parallel to the gradient of the objective at the optimizer. If we multiply the gradient of the constraint by λ* = −4 and we add the result to the gradient of the objective, we get the null vector, as required. Actually, all of this boils down to requiring that the negative of the gradient, −∇f*, which is the steepest descent direction, be orthogonal to the constraints at the optimizer; this means that further improvements could be obtained only by going out of the feasible region, which is forbidden. The last condition is what characterizes the optimizer.
If we consider point (1, 3) on the line, the negative of the gradient there is [−2, −6]T which points towards the origin but is not a feasible direction. However, we may decompose this vector as
The first vector is parallel to the gradient of the constraint, and it is an infeasible direction; the second vector is orthogonal to the gradient of the constraint, and it is the feasible component of the objective gradient, representing a step along the line. This decomposition in a feasible and infeasible component of the desired displacement are illustrated in Fig. 12.16 as well. At the optimal solution, there is no feasible component of the (negative) objective gradient.
As we pointed out, the justification we offered yields a very useful interpretation, but it is not quite a proof. For instance, can we always express the gradient of the objective as a linear combination of the gradient of the constraints? The following example shows a pathological case where we are in trouble.
Example 12.17 (The role of constraint qualification) To understand the issue behind the constraint qualification condition, consider the problem
It is easy to see that the feasible set is the single point (0, 0), which is the (trivial) optimal solution. Let us ignore this fact and build the Lagrangian function
The stationarity conditions yield the system
Unfortunately, this system of equations has no solution; the first equation requires x1 ≠ 0, which is not compatible with the last two equations. This is due to the fact that the gradients of the two constraints are parallel at the origin:
and they are not a basis able to express the gradient of f:
We say that the origin is not a regular point, as the constraint qualification conditions do not hold there.
The reader is urged to draw a diagram for the last example, in order to visualize the issues involved; constraint qualification conditions ensure that such difficulties do not arise.
Leave a Reply