In Section 2.7 we defined the derivative of a function of a single variable as the limit of an increment ratio:
If we have a function of several variables, we may readily extend the concept above by considering a point and perturbing one variable at a time. We obtain the concept of a partial derivative with respect to a single variable xi :
As in the single-variable case, we should not take for granted that the limit above exists, but we will not consider too many technicalities. In practice, the limit above is readily computed by applying the usual rules for derivatives, considering one variable at a time and keeping the other variables fixed. For a function of n variables, we have n (first-order) partial derivatives at a point x0. They can be grouped into a column vector in , which is called the gradient of f at point x0:
Example 3.16 Consider the quadratic form . We may compute the following partial derivatives:
When computing the partial derivative with respect to x1, we consider x2 as a constant, and this explains why x2 does not contribute to the first partial derivative. The two partial derivatives can be grouped into the gradient
To see a more contrived example, consider the following function:
Fig. 3.12 The gradient gives the direction of maximum ascent at each point.
We invite the reader to verify that its gradient is
How can we interpret the gradient? We know from single variable calculus that a stationary point is a natural candidate to be a minimum or a maximum of a function; at that point, the tangent line is horizontal. In the multivariable case, we can say something similar by referring to the gradient. A stationary point (or point of stationarity) is a point x* such that the gradient is the zero vector: ∇f(x*) = 0. For instance, the origin is a stationary point of the quadratic form in the example above. Geometric interpretation is difficult for several variables, but in the case of two variables, the function is graphed in three dimensions and a stationary point is characterized by a horizontal tangent plane. In general, it can be shown that the gradient is a vector pointing toward the direction of maximum ascent of the function.
Example 3.17 Consider again the quadratic form , and its gradient. The function is just the squared distance of each point from the origin and its level curves are concentric circles, as shown in Fig. 3.12. The figure also shows the gradient vector for a few points on a level curves. We see that the gradient ∇f(x1, x2) at each point is a vector moving away from the origin toward infinity, and along that direction the function has the steepest ascent. If we change the sign of the gradient, we get a vector pointing toward the origin, and spotting the path of steepest descent.
It is easy to understand that this feature of the gradient is relevant to function maximization and minimization.
The gradient vector collects the first-order partial derivatives of a function with respect to all of the independent variables. We may also consider second-order derivatives by repeated application of partial differentiation. If we take a derivative of function f twice with respect to the same variable xi, we have the second-order partial derivative, which is denoted as
We may also take the derivative with respect to two different variables, which yields a mixed derivative; if we take partial derivative first with respect to xi, and then with respect to xj, we obtain
An immediate question is if the order with which derivatives are taken is relevant or not. The answer is that, when suitable continuity conditions are met, the order in which we take derivatives is inconsequential:19
If we group second-order partial derivatives into a matrix, we obtain the Hessian matrix:
Since the order of variables in mixed terms is not relevant, the Hessian matrix is symmetric.
Example 3.18 Let us find the Hessian matrix for the quadratic forms:
First, we calculate all of the relevant partial derivatives for f:
The Hessian matrix of f is diagonal:
The reader is invited to verify that, in the case of g, we obtain
From the example, we immediately notice that this Hessian matrix is just twice the matrix associated with the quadratic form. Indeed, if we write a quadratic form as
we see that matrix A is its Hessian. Another useful result concerning quadratic forms written in the form of Eq. (3.20) is
The result is easy to check directly.
3.9.2 Taylor’s expansion for multivariable functions
Using the gradient and Hessian matrix, we may generalize Taylor’s expansion to functions of multiple variables. The second-order expansion around point x0 is as follows:
We see that this approximation boils down to Eq. (2.13) when considering a function of a single variable. If we stop the expansion to first-order terms, we get a linear approximation, which is just a plane when working with functions of two variables. Such a tangent plane is illustrated in Fig. 3.13. The inclusion of the second-order terms implies that the approximation involves a quadratic form. The approximation can be convex, concave, or neither, depending on the definiteness of the quadratic form corresponding to the Hessian matrix. Hence, the eigenvalues of the Hessian matrix are useful in analyzing convexity/concavity issues and in checking optimality conditions.
Fig. 3.13 First-order Taylor expansion yields a tangent plane.
Example 3.19 Let us compute Taylor’s expansion of function
in a neighborhood of point (1, 1). The partial derivatives are
Evaluating gradient and Hessian at (1, 1) yields
Note that, unlike a quadratic form, Taylor’s expansion of f depend on the point at which it is taken. Considering small displacements δ1 and δ2 around x1 = 1 and x2, we get
Fig. 3.14 Surface plot of function .
The eigenvalues of the Hessian matrix are λ1 = −0.5 and λ2 = 0. One of the eigenvalues is zero, and indeed it is easy to see that the Hessian matrix is singular. Moreover, the eigenvalues are both nonpositive, suggesting that the function f is locally concave, but not strictly. A surface plot of the function is illustrated in Fig. 3.14, for positive values of x1 and x2; from the figure, we see the concavity of the function in this region. A closer look at the surface explains why the function is not strictly concave there. For x1 = x2, we have . Hence, the function is linear along that direction.
Leave a Reply