Partial derivatives: gradient and Hessian matrix

In Section 2.7 we defined the derivative of a function of a single variable as the limit of an increment ratio:

images

If we have a function of several variables, we may readily extend the concept above by considering a point images and perturbing one variable at a time. We obtain the concept of a partial derivative with respect to a single variable xi :

images

As in the single-variable case, we should not take for granted that the limit above exists, but we will not consider too many technicalities. In practice, the limit above is readily computed by applying the usual rules for derivatives, considering one variable at a time and keeping the other variables fixed. For a function of n variables, we have n (first-order) partial derivatives at a point x0. They can be grouped into a column vector in images, which is called the gradient of f at point x0:

images

Example 3.16 Consider the quadratic form images. We may compute the following partial derivatives:

images

When computing the partial derivative with respect to x1, we consider x2 as a constant, and this explains why x2 does not contribute to the first partial derivative. The two partial derivatives can be grouped into the gradient

images

To see a more contrived example, consider the following function:

images
images

Fig. 3.12 The gradient gives the direction of maximum ascent at each point.

We invite the reader to verify that its gradient is

images

How can we interpret the gradient? We know from single variable calculus that a stationary point is a natural candidate to be a minimum or a maximum of a function; at that point, the tangent line is horizontal. In the multivariable case, we can say something similar by referring to the gradient. A stationary point (or point of stationarity) is a point x* such that the gradient is the zero vector: ∇f(x*) = 0. For instance, the origin is a stationary point of the quadratic form in the example above. Geometric interpretation is difficult for several variables, but in the case of two variables, the function is graphed in three dimensions and a stationary point is characterized by a horizontal tangent plane. In general, it can be shown that the gradient is a vector pointing toward the direction of maximum ascent of the function.

Example 3.17 Consider again the quadratic form images, and its gradient. The function is just the squared distance of each point from the origin and its level curves are concentric circles, as shown in Fig. 3.12. The figure also shows the gradient vector for a few points on a level curves. We see that the gradient ∇f(x1x2) at each point is a vector moving away from the origin toward infinity, and along that direction the function has the steepest ascent. If we change the sign of the gradient, we get a vector pointing toward the origin, and spotting the path of steepest descent.

It is easy to understand that this feature of the gradient is relevant to function maximization and minimization.

The gradient vector collects the first-order partial derivatives of a function with respect to all of the independent variables. We may also consider second-order derivatives by repeated application of partial differentiation. If we take a derivative of function f twice with respect to the same variable xi, we have the second-order partial derivative, which is denoted as

images

We may also take the derivative with respect to two different variables, which yields a mixed derivative; if we take partial derivative first with respect to xi, and then with respect to xj, we obtain

images

An immediate question is if the order with which derivatives are taken is relevant or not. The answer is that, when suitable continuity conditions are met, the order in which we take derivatives is inconsequential:19

images

If we group second-order partial derivatives into a matrix, we obtain the Hessian matrix:

images

Since the order of variables in mixed terms is not relevant, the Hessian matrix is symmetric.

Example 3.18 Let us find the Hessian matrix for the quadratic forms:

images

First, we calculate all of the relevant partial derivatives for f:

images

The Hessian matrix of f is diagonal:

images

The reader is invited to verify that, in the case of g, we obtain

images

From the example, we immediately notice that this Hessian matrix is just twice the matrix associated with the quadratic form. Indeed, if we write a quadratic form as

images

we see that matrix A is its Hessian. Another useful result concerning quadratic forms written in the form of Eq. (3.20) is

images

The result is easy to check directly.

3.9.2 Taylor’s expansion for multivariable functions

Using the gradient and Hessian matrix, we may generalize Taylor’s expansion to functions of multiple variables. The second-order expansion around point x0 is as follows:

images

We see that this approximation boils down to Eq. (2.13) when considering a function of a single variable. If we stop the expansion to first-order terms, we get a linear approximation, which is just a plane when working with functions of two variables. Such a tangent plane is illustrated in Fig. 3.13. The inclusion of the second-order terms implies that the approximation involves a quadratic form. The approximation can be convex, concave, or neither, depending on the definiteness of the quadratic form corresponding to the Hessian matrix. Hence, the eigenvalues of the Hessian matrix are useful in analyzing convexity/concavity issues and in checking optimality conditions.

images

Fig. 3.13 First-order Taylor expansion yields a tangent plane.

Example 3.19 Let us compute Taylor’s expansion of function

images

in a neighborhood of point (1, 1). The partial derivatives are

images

Evaluating gradient and Hessian at (1, 1) yields

images

Note that, unlike a quadratic form, Taylor’s expansion of f depend on the point at which it is taken. Considering small displacements δ1 and δ2 around x1 = 1 and x2, we get

images

Fig. 3.14 Surface plot of function images.

images

The eigenvalues of the Hessian matrix are λ1 = −0.5 and λ2 = 0. One of the eigenvalues is zero, and indeed it is easy to see that the Hessian matrix is singular. Moreover, the eigenvalues are both nonpositive, suggesting that the function f is locally concave, but not strictly. A surface plot of the function is illustrated in Fig. 3.14, for positive values of x1 and x2; from the figure, we see the concavity of the function in this region. A closer look at the surface explains why the function is not strictly concave there. For x1 = x2, we have images. Hence, the function is linear along that direction.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *