Multivariable CalculusThe chain rule
One way of describing the 
The chain rule in multivariable calculus works similarly. If we compose a differentiable function with a differentiable function , we get a function whose derivative is
Note that the right-hand side can also be written as , since  is a row vector, and the product of a row vector and a column vector is the same as the dot product of the 
We visualize XEQUATIONX4192XEQUATIONX by drawing the points , which trace out a curve in the plane. We visualize only by showing the direction of its gradient at the point . The change in from one point on the curve to another is the dot product of the change in position and the gradient.
Exercise
Suppose that , that , and that  and . Find the derivative of the function  at the point .
Solution. The chain rule implies that the derivative of is
Exercise
Find the derivative with respect to  of the function  by writing the function as  where  and  and .
Solution. Let where and . We have that and . Since both derivatives of and with respect to are 1, the chain rule implies that
Exercise
Suppose that  for some matrix , and suppose that  is the componentwise squaring function (in other words, ). Find the derivative of .
Note: you might find it convenient to express your answer using the function diag which maps a vector to a matrix with that vector along the diagonal.
Solution. The derivative matrix of is diagonal, since the derivative of with respect to is zero unless . The diagonal entries are . The derivative of is , as we saw in the section on matrix differentiation. Therefore, the derivative of the composition is
We can check this exercise numerically:
import numpy as np
A = np.random.random_sample((5,5))
x = np.random.random_sample(5)
Δx = 1e-6 * np.random.random_sample(5)
def f(y):
    "Componentwise square x"
    return y**2
def g(x):
    "Multiply A by x"
    return A @ x
derivative = 2 * np.diag(A @ x) @ A
np.allclose(f(g(x + Δx)) - f(g(x)), derivative @ Δx) English
English