Partial derivatives: gradient and Hessian matrix

In Section 2.7 we defined the derivative of a function of a single variable as the limit of an increment ratio:

If we have a function of several variables, we may readily extend the concept above by considering a point and perturbing one variable at a time. We obtain the concept of a partial derivative with respect to a single variable x_i :

As in the single-variable case, we should not take for granted that the limit above exists, but we will not consider too many technicalities. In practice, the limit above is readily computed by applying the usual rules for derivatives, considering one variable at a time and keeping the other variables fixed. For a function of n variables, we have n (first-order) partial derivatives at a point x⁰. They can be grouped into a column vector in , which is called the gradient of f at point x⁰:

Example 3.16 Consider the quadratic form . We may compute the following partial derivatives:

When computing the partial derivative with respect to x₁, we consider x₂ as a constant, and this explains why x₂ does not contribute to the first partial derivative. The two partial derivatives can be grouped into the gradient

To see a more contrived example, consider the following function:

Fig. 3.12 The gradient gives the direction of maximum ascent at each point.

We invite the reader to verify that its gradient is

How can we interpret the gradient? We know from single variable calculus that a stationary point is a natural candidate to be a minimum or a maximum of a function; at that point, the tangent line is horizontal. In the multivariable case, we can say something similar by referring to the gradient. A stationary point (or point of stationarity) is a point x* such that the gradient is the zero vector: ∇f(x*) = 0. For instance, the origin is a stationary point of the quadratic form in the example above. Geometric interpretation is difficult for several variables, but in the case of two variables, the function is graphed in three dimensions and a stationary point is characterized by a horizontal tangent plane. In general, it can be shown that the gradient is a vector pointing toward the direction of maximum ascent of the function.

Example 3.17 Consider again the quadratic form , and its gradient. The function is just the squared distance of each point from the origin and its level curves are concentric circles, as shown in Fig. 3.12. The figure also shows the gradient vector for a few points on a level curves. We see that the gradient ∇f(x₁, x₂) at each point is a vector moving away from the origin toward infinity, and along that direction the function has the steepest ascent. If we change the sign of the gradient, we get a vector pointing toward the origin, and spotting the path of steepest descent.

It is easy to understand that this feature of the gradient is relevant to function maximization and minimization.

The gradient vector collects the first-order partial derivatives of a function with respect to all of the independent variables. We may also consider second-order derivatives by repeated application of partial differentiation. If we take a derivative of function f twice with respect to the same variable x_i, we have the second-order partial derivative, which is denoted as

We may also take the derivative with respect to two different variables, which yields a mixed derivative; if we take partial derivative first with respect to x_i, and then with respect to x_j, we obtain

An immediate question is if the order with which derivatives are taken is relevant or not. The answer is that, when suitable continuity conditions are met, the order in which we take derivatives is inconsequential:¹⁹

If we group second-order partial derivatives into a matrix, we obtain the Hessian matrix:

Since the order of variables in mixed terms is not relevant, the Hessian matrix is symmetric.

Example 3.18 Let us find the Hessian matrix for the quadratic forms:

First, we calculate all of the relevant partial derivatives for f:

The Hessian matrix of f is diagonal:

The reader is invited to verify that, in the case of g, we obtain

From the example, we immediately notice that this Hessian matrix is just twice the matrix associated with the quadratic form. Indeed, if we write a quadratic form as

we see that matrix A is its Hessian. Another useful result concerning quadratic forms written in the form of Eq. (3.20) is

The result is easy to check directly.

3.9.2 Taylor’s expansion for multivariable functions

Using the gradient and Hessian matrix, we may generalize Taylor’s expansion to functions of multiple variables. The second-order expansion around point x₀ is as follows:

We see that this approximation boils down to Eq. (2.13) when considering a function of a single variable. If we stop the expansion to first-order terms, we get a linear approximation, which is just a plane when working with functions of two variables. Such a tangent plane is illustrated in Fig. 3.13. The inclusion of the second-order terms implies that the approximation involves a quadratic form. The approximation can be convex, concave, or neither, depending on the definiteness of the quadratic form corresponding to the Hessian matrix. Hence, the eigenvalues of the Hessian matrix are useful in analyzing convexity/concavity issues and in checking optimality conditions.

Fig. 3.13 First-order Taylor expansion yields a tangent plane.

Example 3.19 Let us compute Taylor’s expansion of function

in a neighborhood of point (1, 1). The partial derivatives are

Evaluating gradient and Hessian at (1, 1) yields

Note that, unlike a quadratic form, Taylor’s expansion of f depend on the point at which it is taken. Considering small displacements δ₁ and δ₂ around x₁ = 1 and x₂, we get

Fig. 3.14 Surface plot of function .

The eigenvalues of the Hessian matrix are λ₁ = −0.5 and λ₂ = 0. One of the eigenvalues is zero, and indeed it is easy to see that the Hessian matrix is singular. Moreover, the eigenvalues are both nonpositive, suggesting that the function f is locally concave, but not strictly. A surface plot of the function is illustrated in Fig. 3.14, for positive values of x₁ and x₂; from the figure, we see the concavity of the function in this region. A closer look at the surface explains why the function is not strictly concave there. For x₁ = x₂, we have . Hence, the function is linear along that direction.

Partial derivatives: gradient and Hessian matrix

3.9.2 Taylor’s expansion for multivariable functions

Comments

Leave a Reply Cancel reply