On the different notions of derivative¶
The concept of a derivative is one of the core concepts of mathematical analysis analysis, and it is essential whenever a linear approximation of a function in some point is required. Since the notion of derivative has different meanings in different contexts, this guide has been written to introduce the different derivative concepts used in ODL.
In short, different notions of derivatives that will be discussed here are:
- Derivative. When we write “derivative” in ODL code and documentation, we mean the derivative of an
Operator
w.r.t to a disturbance in
, i.e a linear approximation of
for small
. The derivative in a point
is an
Operator
.
- Gradient. If the operator
is a
functional
, i.e., then the gradient is the direction in which
increases the most. The gradient in a point
is a vector
in
such that
. The gradient operator is the operator
.
- Hessian. The hessian in a point
is the derivative operator of the gradient operator, i.e.
.
- Spatial Gradient. The spatial gradient is only defined for spaces
whose elements are functions over some domain
taking values in
or
. It can be seen as a vectorized version of the usual gradient, taken in each point in
.
- Subgradient. The subgradient extends the notion of derivative to any convex functional and is used in some optimization solvers where the objective function is not differentiable.
Derivative¶
The derivative is usually introduced for functions via the limit
Here we say that the derivative of in
is
.
This limit makes sense in one dimension, but once we start considering functions in higher dimension we get into trouble.
Consider – what would
mean in this case?
An extension is the concept of a directional derivative.
The derivative of
in
in direction
is
:
Here we see (as implied by the notation) that is actually an operator
We can rewrite this using the explicit requirement that is a linear approximation of
at
, i.e.
This notion naturally extends to an Operator
between Banach spaces
and
with norms
and
, respectively.
Here
is defined as the linear operator (if it exists) that satisfies
This definition of the derivative is called the Fréchet derivative.
The Gateaux derivative¶
The concept of directional derivative can also be extended to Banach spaces, giving the Gateaux derivative. The Gateaux derivative is more general than the Fréchet derivative, but is not always a linear operator. An example of a function that is Gateaux differentiable but not Fréchet differentiable is the absolute value function. For this reason, when we write “derivative” in ODL, we generally mean the Fréchet derivative, but in some cases the Gateaux derivative can be used via duck-typing.
Rules for the Fréchet derivative¶
Many of the usual rules for derivatives also hold for the Fréchet derivative, i.e.
Linearity
Chain rule
Linear operators are their own derivatives. If
linear, then
Implementations in ODL¶
- The derivative is implemented in ODL for
Operator
‘s via theOperator.derivative
method. - It can be numerically computed using the
NumericalDerivative
operator. - Many of the operator arithmetic classes implement the usual rules for the Fréchet derivative, such as the chain rule, distributivity over addition etc.
Gradient¶
In the classical setting of functionals , the gradient is the vector
This can be generalized to the setting of functionals mapping elements in some Banach space
to the real numbers by noting that the Fréchet derivative can be written as
where lies in the dual space of
, denoted
. For most spaces in ODL, the spaces are Hilbert spaces where
by the Riesz representation theorem and hence
.
We call the (possibly nonlinear) operator the Gradient operator of
.
Implementations in ODL¶
- The gradient is implemented in ODL
Functional
‘s via theFunctional.gradient
method. - It can be numerically computed using the
NumericalGradient
operator.
Hessian¶
In the classical setting of functionals , the Hessian in a point
is the matrix
such that
with the derivatives are evaluated in the point .
It has the property that that the quadratic variation of
is
but also that the derivative of the gradient operator is
If we take this second property as the definition of the Hessian, it can easily be generalized to the setting of functionals mapping elements in some Hilbert space
to the real numbers.
Implementations in ODL¶
The Hessian is not explicitly implemented anywhere in ODL. Instead it can be used in the form of the derivative of the gradient operator. This is however not implemented for all functionals.
- For an example of a functional whose gradient has a derivative, see
RosenbrockFunctional
. - It can be computed by taking the
NumericalDerivative
of the gradient, which can in turn be computed using theNumericalGradient
.
Spatial Gradient¶
The spatial gradient of a function is an element in the function space
such that for any
.
Implementations in ODL¶
- The spatial gradient is implemented in ODL in the
Gradient
operator. - Several related operators such as the
PartialDerivative
andLaplacian
are also available.
Subgradient¶
The Subgradient (also subderivative or subdifferential) of a convex function , mapping a Banach space
to
, is defined as the set-valued function
whose values are:
for functions that are differentiable in the usual sense, this reduces to the usual gradient.
Implementations in ODL¶
The subgradient is not explicitly implemented in odl, but is implicitly used in the proximal operators. See Proximal Operators for more information.