Convexity can be easily generalized to functions by applying the idea of convexity for sets to the epigraph of the function. For functions of a single variable, which can be plotted on a plane, the epigraph of the function is just the set of points lying above the function graph. The idea generalizes to an arbitrary number of dimensions. Figure 2.25 illustrates the concept for functions of one variable. Function (a) is obviously convex, whereas function (b) is not. We see immediately that function (a) is easy to minimize, since there is one point where the first-order derivative is zero, and it is a global minimum Function (b) is not that easy, as it features local maxima and minima. Function (c) may look a bit weird, as it is not differentiable; nevertheless, it is a convex function. We also see that the epigraph of function (c) is a polyhedron; this is why a function like that is called a polyhedral convex function. We will see some tricks that allow us to minimize a polyhedral convex function quite easily, despite lack of differentiability.
There is an alternative way to regard convexity, which is better suited to emphasize the essential features of a convex function. If a function is convex and we take two arbitrary points on its graph, the line segment joining them will always stay above the function graph. This allows us to define convex functions as follows.
Fig. 2.26 A schematic illustration of the definition of a convex function.
DEFINITION 2.17 (Convex and concave functions) A function f, defined on a domain S, is convex on S if, given any pair of points y and z in S, the following condition holds for any real number λ in the interval [0, 1];
A function f is concave if (−f) is convex.
This definition is illustrated in Fig. 2.26. In this case the set S is a portion of the real line, within which we consider an interval [y, z]; if we take any point x = λy + (1 − λ)z within that interval, the true function value at x is less than the linearly interpolated value λf(y) + (1 − λ)f(z). A concave function is just a convex function flipped upside down. If the inequality is strict for y ≠ z and λ in the open interval (0, 1), we have a strictly convex function. Figure 2.27 illustrates the difference between a convex and a strictly convex function.
Intuition suggests that for a convex function we do not have trouble with local minima, whereas for a concave function we do not have trouble with local maxima. If convexity is not strict, we may have multiple but equivalent optima. It is also important to realize that the definition above does not rely on differentiability of function f and applies to functions defined on any n-dimensional space. If we restrict our attention to differentiable functions of one variable, then the following properties hold for a function of one variable.25
PROPERTY 2.18 If f is a convex differentiable function of one variable, on a domain , then the following condition holds, for any x0 and x in S:
Fig. 2.27 (a) Convex vs. (b) strictly convex functions.
The inequality is reversed for a concave function.
PROPERTY 2.19 If f is a convex differentiable function of one variable, on a domain , then the following condition holds, for any x in S:
The inequality is reversed for a concave function.
Property 2.18 essentially says that if we draw the tangent line y = f(x0) + f′(x0)(x − x0) at any point with coordinates [x0, f(x0)] on the graph of a convex (differentiable) function, then the linear approximation (first-order Taylor expansion) will consistently underestimate the true value of the function (see Fig. 2.28). Property 2.19 states that the slope of tangent lines to f, drawn at different points x, is always increasing with respect to x, since the second-order derivative is always positive. Again, this property assumes suitable differentiability conditions, and is illustrated in Fig. 2.29. The following example shows the connection between convex functions and sets.
Example 2.34 We show that the region S described by the inequality g(x) ≤ 0 is a convex set if g is a convex function. If x ∈ S, then g(x) ≤ 0; by the same token, if y ∈ S, then g(y) ≤ 0. What we want to prove is that λx + (1 − λ)y ∈ S, for all λ ∈ [0, 1], i.e., that g(λx + (1 − λ)y) ≤ 0. But, since g is convex, we have
where the last inequality depends on the fact that we are summing non-positive terms, which are obtained by multiplying a nonpositive quantity, g(), by nonnegative coefficients λ and (1 − λ). This proves that λx + (1 − λ)y is in S. From a practical point of view, this property plays an important role when inequalities are used to define the feasible set of an optimization problem.
Fig. 2.28 Tangent lines always underestimate a convex function.
Fig. 2.29 Slopes of tangent lines are increasing for a convex function.
Leave a Reply