5.1.2 Reverse or adjoint sensitivity

About

Installation

Tutorials

Documentation

1.1 Introduction

1.3 Continuous equations in `r' coordinates

2. Discretization and Algorithm

4. Software Architecture

5. Automatic Differentiation

6. Physical Parameterization and Packages

7. Diagnostics and tools

8. Interface with ECCO

Browse Code

PDF file (9Mb)

PS file (7Mb)

Next: 5.1.3 Storing vs. recomputation Up: 5.1 Some basic algebra Previous: 5.1.1 Forward or direct Contents

Subsections

- 5.1.2.0.1 Example 1: ${\cal J} = v_{j} (T)$
- 5.1.2.0.2 Example 2: ${\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, , \, {\cal H}(\vec{v}) - \vec{d} \, \rangle$

5.1.2 Reverse or adjoint sensitivity

Let us consider the special case of a scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g. the total meridional heat transport, the total uptake of $CO_{2}$ in the Southern Ocean over a time interval, or a measure of some model-to-data misfit)

$\begin{displaymath}\begin{array}{cccccc} {\cal J} \, : & U & \longrightarrow & ... ...o & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u})) \end{array}\end{displaymath}$

(5.4)

The perturbation of ${\cal J}$ around a fixed point ${\cal J}_0$ ,

$\displaystyle {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}$

can be expressed in both bases of $\vec{u}$ and $\vec{v}$ w.r.t. their corresponding inner product $\left\langle \,\, , \,\, \right\rangle$

$\begin{equation*}\begin{aligned}{\cal J} & = \, {\cal J} \vert _{\vec{u}^{(0)}} ... ...vec{v} \, \right\rangle \, + \, O(\delta \vec{v}^2) \end{aligned}\end{equation*}$

(note, that the gradient $\nabla f$ is a co-vector, therefore its transpose is required in the above inner product). Then, using the representation of $\delta {\cal J} = \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle$ , the definition of an adjoint operator $A^{\ast}$ of a given operator

$\displaystyle \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle = \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle$

which for finite-dimensional vector spaces is just the transpose of

$\displaystyle A^{\ast} \, = \, A^T$

and from eq. (5.2), (5.5), we note that (omitting $\vert$ 's):

$\displaystyle \delta {\cal J} \, = \, \left\langle \, \nabla _{v}{\cal J}^T \, ... ...\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{u} \, \right\rangle$

(5.6)

With the identity (5.5), we then find that the gradient $\nabla _{u}{\cal J}$ can be readily inferred by invoking the adjoint $M^{\ast }$ of the tangent linear model

$\begin{equation*}\begin{aligned}\nabla _{u}{\cal J}^T \vert _{\vec{u}} & = \, M^... ...ta \vec{v}^{\ast} \\ ~ & = \, \delta \vec{u}^{\ast} \end{aligned}\end{equation*}$

Eq. (5.7) is the adjoint model (ADM), in which

is the adjoint (here, the transpose) of the tangent linear operator

, $\delta \vec{v}^{\ast}$ the adjoint variable of the model state $\vec{v}$ , and $\delta \vec{u}^{\ast}$ the adjoint variable of the control variable $\vec{u}$ .

The reverse nature of the adjoint calculation can be readily seen as follows. Consider a model integration which consists of $\Lambda$ consecutive operations ${\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} ( ...... ( {\cal M}_{\lambda} ( ...... ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) ))))$ , where the ${\cal M}$ 's could be the elementary steps, i.e. single lines in the code of the model, or successive time steps of the model integration, starting at step 0 and moving up to step $\Lambda$ , with intermediate ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$ . Let ${\cal J}$ be a cost function which explicitly depends on the final state $\vec{v}$ only (this restriction is for clarity reasons only). ${\cal J}(u)$ may be decomposed according to:

$\displaystyle {\cal J}({\cal M}(\vec{u})) \, = \, {\cal J} ( {\cal M}_{\Lambda}... ....... ( {\cal M}_{\lambda} ( ...... ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))$

(5.8)

Then, according to the chain rule, the forward calculation reads, in terms of the Jacobi matrices (we've omitted the $\vert$ 's which, nevertheless are important to the aspect of tangent linearity; note also that by definition $\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle = \nabla_v {\cal J} \cdot \delta \vec{v}$ )

$\begin{equation*}\begin{aligned}\nabla_v {\cal J} (M(\delta \vec{u})) & = \, \na... ... ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\ \end{aligned}\end{equation*}$

whereas in reverse mode we have

$\begin{equation*}\boxed{ \begin{aligned}M^T ( \nabla_v {\cal J}^T) & = \, M_{0}^... ...bda)}} {\cal J}^T \\ ~ & = \, \nabla_u {\cal J}^T \end{aligned} }\end{equation*}$

clearly expressing the reverse nature of the calculation. Eq. (5.10) is at the heart of automatic adjoint compilers. If the intermediate steps $\lambda$ in eqn. (5.8) - (5.10) represent the model state (forward or adjoint) at each intermediate time step as noted above, then correspondingly, $M^T (\delta \vec{v}^{(\lambda) \, \ast}) = \delta \vec{v}^{(\lambda-1) \, \ast}$ for the adjoint variables. It thus becomes evident that the adjoint calculation also yields the adjoint of each model state component $\vec{v}^{(\lambda)}$ at each intermediate step $\lambda$ , namely

$\begin{equation*}\boxed{ \begin{aligned}\nabla_{v^{(\lambda)}} {\cal J}^T \vert ... ...t} \\ ~ & = \, \delta \vec{v}^{(\lambda) \, \ast} \end{aligned} }\end{equation*}$

in close analogy to eq. (5.7) We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$ are the Lagrange multipliers of the model equations which determine $\vec{v}^{(\lambda)}$ .

In components, eq. (5.7) reads as follows. Let

$\begin{displaymath} \begin{array}{rclcrcl} \delta \vec{u} & = & \left( \delta u_... ...frac{\partial {\cal J}}{\partial v_n} \right)^T \\ \end{array}\end{displaymath}$

denote the perturbations in $\vec{u}$ and $\vec{v}$ , respectively, and their adjoint variables; further

$\begin{displaymath} M \, = \, \left( \begin{array}{ccc} \frac{\partial {\cal M}_... ...frac{\partial {\cal M}_n}{\partial u_m} \\ \end{array}\right) \end{displaymath}$

is the Jacobi matrix of ${\cal M}$ (an $n \times m$ matrix) such that $\delta \vec{v} = M \cdot \delta \vec{u}$ , or

$\displaystyle \delta v_{j} \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i} \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}} \delta u_{i}$

Then eq. (5.7) takes the form

$\displaystyle \delta u_{i}^{\ast} \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^... ...m_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}} \delta v_{j}^{\ast}$

$\begin{displaymath} \left( \begin{array}{c} \left. \frac{\partial}{\partial u_1}... ...al v_n} {\cal J} \right\vert _{\vec{v}} \\ \end{array}\right) \end{displaymath}$

Furthermore, the adjoint $\delta v^{(\lambda) \, \ast}$ of any intermediate state $v^{(\lambda)}$ may be obtained, using the intermediate Jacobian (an $n_{\lambda+1} \times n_{\lambda}$ matrix)

$\begin{displaymath} M_{\lambda} \, = \, \left( \begin{array}{ccc} \frac{\partial... ...}{\partial v^{(\lambda)}_{n_{\lambda}}} \\ \end{array}\right) \end{displaymath}$

and the shorthand notation for the adjoint variables $\delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}} {\cal J}^T$ , $j = 1, \ldots , n_{\lambda}$ , for intermediate components, yielding

$\begin{equation*}\begin{aligned}\left( \begin{array}{c} \delta v^{(\lambda) \, \... ...vdots \\ \delta v^{\ast}_{n} \\ \end{array} \right) \end{aligned}\end{equation*}$

Eq. (5.9) and (5.10) are perhaps clearest in showing the advantage of the reverse over the forward mode if the gradient $\nabla _{u}{\cal J}$ , i.e. the sensitivity of the cost function ${\cal J}$ with respect to all input variables (or the sensitivity of the cost function with respect to all intermediate states $\vec{v}^{(\lambda)}$ ) are sought. In order to be able to solve for each component of the gradient $\partial {\cal J} / \partial u_{i}$ in (5.9) a forward calculation has to be performed for each component separately, i.e. $\delta \vec{u} = \delta u_{i} {\vec{e}_{i}}$ for the -th forward calculation. Then, (5.9) represents the projection of $\nabla_u {\cal J}$ onto the -th component. The full gradient is retrieved from the forward calculations. In contrast, eq. (5.10) yields the full gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients $\nabla _{v^{(\lambda)}}{\cal J}$ ) within a single reverse calculation.

Note, that if ${\cal J}$ is a vector-valued function of dimension , eq. (5.10) has to be modified according to

$\displaystyle M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right) \, = \, \nabla_u {\cal J}^T \cdot \delta \vec{J}$

where now $\delta \vec{J} \in I\!\!R^l$ is a vector of dimension

. In this case

reverse simulations have to be performed for each $\delta J_{k}, \,\, k = 1, \ldots, l$ . Then, the reverse mode is more efficient as long as

, otherwise the forward mode is preferable. Strictly, the reverse mode is called adjoint mode only for

A detailed analysis of the underlying numerical operations shows that the computation of $\nabla _{u}{\cal J}$ in this way requires about 2 to 5 times the computation of the cost function. Alternatively, the gradient vector could be approximated by finite differences, requiring computations of the perturbed cost function.

To conclude we give two examples of commonly used types of cost functions:

5.1.2.0.1 Example 1: ${\cal J} = v_{j} (T)$

The cost function consists of the

-th component of the model state $\vec{v}$ at time

. Then $\nabla_v {\cal J}^T = {\vec{f}_{j}}$ is just the

-th unit vector. The $\nabla_u {\cal J}^T$ is the projection of the adjoint operator onto the

-th component ${\bf f_{j}}$ ,

$\displaystyle \nabla_u {\cal J}^T \, = \, M^T \cdot \nabla_v {\cal J}^T \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}$

5.1.2.0.2 Example 2: ${\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, , \, {\cal H}(\vec{v}) - \vec{d} \, \rangle$

The cost function represents the quadratic model vs. data misfit. Here, $\vec{d}$ is the data vector and ${\cal H}$ represents the operator which maps the model state space onto the data space. Then, $\nabla_v {\cal J}$ takes the form

$\begin{equation*}\begin{aligned}\nabla_v {\cal J}^T & = \, 2 \, \, H \cdot \left... ...\vec{v}) - d_k \right) \right\} \, {\vec{f}_{j}} \\ \end{aligned}\end{equation*}$

where $H_{kj} = \partial {\cal H}_k / \partial v_{j}$ is the Jacobi matrix of the data projection operator. Thus, the gradient $\nabla_u {\cal J}$ is given by the adjoint operator, driven by the model vs. data misfit:

$\displaystyle \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)$

Next: 5.1.3 Storing vs. recomputation Up: 5.1 Some basic algebra Previous: 5.1.1 Forward or direct Contents

mitgcm-support@mitgcm.org

Last update 2018-01-23