Home Contact Us Site Map  
 
       
    next up previous contents
Next: 5.1.3 Storing vs. recomputation Up: 5.1 Some basic algebra Previous: 5.1.1 Forward or direct   Contents

Subsections

5.1.2 Reverse or adjoint sensitivity

Let us consider the special case of a scalar objective function $ {\cal J}(\vec{v})$ of the model output (e.g. the total meridional heat transport, the total uptake of $ CO_{2}$ in the Southern Ocean over a time interval, or a measure of some model-to-data misfit)
\begin{displaymath}\begin{array}{cccccc}
{\cal J} \, : & U &
\longrightarrow & ...
...o & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
\end{array}\end{displaymath}     (5.4)

The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $ ,

$\displaystyle {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
$

can be expressed in both bases of $ \vec{u}$ and $ \vec{v}$ w.r.t. their corresponding inner product $ \left\langle \,\, , \,\, \right\rangle $

\begin{equation*}\begin{aligned}{\cal J} & = \, {\cal J} \vert _{\vec{u}^{(0)}} ...
...vec{v} \, \right\rangle \, + \, O(\delta \vec{v}^2) \end{aligned}\end{equation*}

(note, that the gradient $ \nabla f $ is a co-vector, therefore its transpose is required in the above inner product). Then, using the representation of $ \delta {\cal J} =
\left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $ , the definition of an adjoint operator $ A^{\ast} $ of a given operator $ A$ ,

$\displaystyle \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
\left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
$

which for finite-dimensional vector spaces is just the transpose of $ A$ ,

$\displaystyle A^{\ast} \, = \, A^T
$

and from eq. (5.2), (5.5), we note that (omitting $ \vert$ 's):

$\displaystyle \delta {\cal J} \, = \, \left\langle \, \nabla _{v}{\cal J}^T \, ...
...\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{u} \, \right\rangle$ (5.6)

With the identity (5.5), we then find that the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $

\begin{equation*}\begin{aligned}\nabla _{u}{\cal J}^T \vert _{\vec{u}} & = \, M^...
...ta \vec{v}^{\ast} \\ ~ & = \, \delta \vec{u}^{\ast} \end{aligned}\end{equation*}

Eq. (5.7) is the adjoint model (ADM), in which $ M^T $ is the adjoint (here, the transpose) of the tangent linear operator $ M $ , $ \delta \vec{v}^{\ast} $ the adjoint variable of the model state $ \vec{v}$ , and $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u}$ .

The reverse nature of the adjoint calculation can be readily seen as follows. Consider a model integration which consists of $ \Lambda $ consecutive operations $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
...... ( {\cal M}_{\lambda} (
......
( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $ , where the $ {\cal M}$ 's could be the elementary steps, i.e. single lines in the code of the model, or successive time steps of the model integration, starting at step 0 and moving up to step $ \Lambda $ , with intermediate $ {\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final $ {\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$ . Let $ {\cal J} $ be a cost function which explicitly depends on the final state $ \vec{v}$ only (this restriction is for clarity reasons only). $ {\cal J}(u)$ may be decomposed according to:

$\displaystyle {\cal J}({\cal M}(\vec{u})) \, = \, {\cal J} ( {\cal M}_{\Lambda}...
....... ( {\cal M}_{\lambda} ( ...... ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))$ (5.8)

Then, according to the chain rule, the forward calculation reads, in terms of the Jacobi matrices (we've omitted the $ \vert$ 's which, nevertheless are important to the aspect of tangent linearity; note also that by definition $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
= \nabla_v {\cal J} \cdot \delta \vec{v} $ )

\begin{equation*}\begin{aligned}\nabla_v {\cal J} (M(\delta \vec{u})) & = \, \na...
... ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\ \end{aligned}\end{equation*}

whereas in reverse mode we have

\begin{equation*}\boxed{ \begin{aligned}M^T ( \nabla_v {\cal J}^T) & = \, M_{0}^...
...bda)}} {\cal J}^T \\ ~ & = \, \nabla_u {\cal J}^T \end{aligned} }\end{equation*}

clearly expressing the reverse nature of the calculation. Eq. (5.10) is at the heart of automatic adjoint compilers. If the intermediate steps $ \lambda $ in eqn. (5.8) - (5.10) represent the model state (forward or adjoint) at each intermediate time step as noted above, then correspondingly, $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
\delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables. It thus becomes evident that the adjoint calculation also yields the adjoint of each model state component $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $ , namely

\begin{equation*}\boxed{ \begin{aligned}\nabla_{v^{(\lambda)}} {\cal J}^T \vert ...
...t} \\ ~ & = \, \delta \vec{v}^{(\lambda) \, \ast} \end{aligned} }\end{equation*}

in close analogy to eq. (5.7) We note in passing that that the $ \delta \vec{v}^{(\lambda) \, \ast}$ are the Lagrange multipliers of the model equations which determine $ \vec{v}^{(\lambda)} $ .

In components, eq. (5.7) reads as follows. Let

\begin{displaymath}
\begin{array}{rclcrcl}
\delta \vec{u} & = &
\left( \delta u_...
...frac{\partial {\cal J}}{\partial v_n}
\right)^T \\
\end{array}\end{displaymath}

denote the perturbations in $ \vec{u}$ and $ \vec{v}$ , respectively, and their adjoint variables; further

\begin{displaymath}
M \, = \, \left(
\begin{array}{ccc}
\frac{\partial {\cal M}_...
...frac{\partial {\cal M}_n}{\partial u_m} \\
\end{array}\right)
\end{displaymath}

is the Jacobi matrix of $ {\cal M}$ (an $ n \times m $ matrix) such that $ \delta \vec{v} = M \cdot \delta \vec{u} $ , or

$\displaystyle \delta v_{j}
\, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
\, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
\delta u_{i}
$

Then eq. (5.7) takes the form

$\displaystyle \delta u_{i}^{\ast}
\, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^...
...m_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
\delta v_{j}^{\ast}
$

or

\begin{displaymath}
\left(
\begin{array}{c}
\left. \frac{\partial}{\partial u_1}...
...al v_n} {\cal J} \right\vert _{\vec{v}} \\
\end{array}\right)
\end{displaymath}

Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $ of any intermediate state $ v^{(\lambda)} $ may be obtained, using the intermediate Jacobian (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)

\begin{displaymath}
M_{\lambda} \, = \,
\left(
\begin{array}{ccc}
\frac{\partial...
...}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
\end{array}\right)
\end{displaymath}

and the shorthand notation for the adjoint variables $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
{\cal J}^T $ , $ j = 1, \ldots , n_{\lambda} $ , for intermediate components, yielding

\begin{equation*}\begin{aligned}\left( \begin{array}{c} \delta v^{(\lambda) \, \...
...vdots \\ \delta v^{\ast}_{n} \\ \end{array} \right) \end{aligned}\end{equation*}

Eq. (5.9) and (5.10) are perhaps clearest in showing the advantage of the reverse over the forward mode if the gradient $ \nabla _{u}{\cal J} $ , i.e. the sensitivity of the cost function $ {\cal J} $ with respect to all input variables $ u$ (or the sensitivity of the cost function with respect to all intermediate states $ \vec{v}^{(\lambda)} $ ) are sought. In order to be able to solve for each component of the gradient $ \partial {\cal J} / \partial u_{i} $ in (5.9) a forward calculation has to be performed for each component separately, i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $ for the $ i$ -th forward calculation. Then, (5.9) represents the projection of $ \nabla_u {\cal J} $ onto the $ i$ -th component. The full gradient is retrieved from the $ m$ forward calculations. In contrast, eq. (5.10) yields the full gradient $ \nabla _{u}{\cal J} $ (and all intermediate gradients $ \nabla _{v^{(\lambda)}}{\cal J}$ ) within a single reverse calculation.

Note, that if $ {\cal J} $ is a vector-valued function of dimension $ l > 1 $ , eq. (5.10) has to be modified according to

$\displaystyle M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
\, = \,
\nabla_u {\cal J}^T \cdot \delta \vec{J}
$

where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of dimension $ l $ . In this case $ l $ reverse simulations have to be performed for each $ \delta J_{k}, \,\, k = 1, \ldots, l $ . Then, the reverse mode is more efficient as long as $ l < n $ , otherwise the forward mode is preferable. Strictly, the reverse mode is called adjoint mode only for $ l = 1 $ .

A detailed analysis of the underlying numerical operations shows that the computation of $ \nabla _{u}{\cal J} $ in this way requires about 2 to 5 times the computation of the cost function. Alternatively, the gradient vector could be approximated by finite differences, requiring $ m$ computations of the perturbed cost function.

To conclude we give two examples of commonly used types of cost functions:

5.1.2.0.1 Example 1: $ {\cal J} = v_{j} (T) $

 
The cost function consists of the $ j$ -th component of the model state $ \vec{v}$ at time $ T$ . Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $ j$ -th unit vector. The $ \nabla_u {\cal J}^T $ is the projection of the adjoint operator onto the $ j$ -th component $ {\bf f_{j}}$ ,

$\displaystyle \nabla_u {\cal J}^T
\, = \, M^T \cdot \nabla_v {\cal J}^T
\, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
$

5.1.2.0.2 Example 2: $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
\, {\cal H}(\vec{v}) - \vec{d} \, \rangle $

 
The cost function represents the quadratic model vs. data misfit. Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the operator which maps the model state space onto the data space. Then, $ \nabla_v {\cal J} $ takes the form

\begin{equation*}\begin{aligned}\nabla_v {\cal J}^T & = \, 2 \, \, H \cdot \left...
...\vec{v}) - d_k \right) \right\} \, {\vec{f}_{j}} \\ \end{aligned}\end{equation*}

where $ H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the Jacobi matrix of the data projection operator. Thus, the gradient $ \nabla_u {\cal J} $ is given by the adjoint operator, driven by the model vs. data misfit:

$\displaystyle \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
$


next up previous contents
Next: 5.1.3 Storing vs. recomputation Up: 5.1 Some basic algebra Previous: 5.1.1 Forward or direct   Contents
mitgcm-support@mitgcm.org
Copyright © 2006 Massachusetts Institute of Technology Last update 2018-01-23