5.1.3 Storing vs. recomputation in reverse mode

About

Installation

Tutorials

Documentation

1.1 Introduction

1.3 Continuous equations in `r' coordinates

2. Discretization and Algorithm

4. Software Architecture

5. Automatic Differentiation

6. Physical Parameterization and Packages

7. Diagnostics and tools

8. Interface with ECCO

Browse Code

PDF file (9Mb)

PS file (7Mb)

Next: 5.2 TLM and ADM Up: 5.1 Some basic algebra Previous: 5.1.2 Reverse or adjoint Contents

5.1.3 Storing vs. recomputation in reverse mode

We note an important aspect of the forward vs. reverse mode calculation. Because of the local character of the derivative (a derivative is defined w.r.t. a point along the trajectory), the intermediate results of the model trajectory $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$ may be required to evaluate the intermediate Jacobian $M_{\lambda}\vert _{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)}$ . This is the case e.g. for nonlinear expressions (momentum advection, nonlinear equation of state), state-dependent conditional statements (parameterization schemes). In the forward mode, the intermediate results are required in the same order as computed by the full forward model ${\cal M}$ , but in the reverse mode they are required in the reverse order. Thus, in the reverse mode the trajectory of the forward model integration ${\cal M}$ has to be stored to be available in the reverse calculation. Alternatively, the complete model state up to the point of evaluation has to be recomputed whenever its value is required.

A method to balance the amount of recomputations vs. storage requirements is called checkpointing (e.g. Griewank [1992], Restrepo et al. [1998]). It is depicted in 5.1 for a 3-level checkpointing [as an example, we give explicit numbers for a 3-day integration with a 1-hourly timestep in square brackets].

: In a first step, the model trajectory is subdivided into ${n}^{lev3}$ subsections [ ${n}^{lev3}$ =3 1-day intervals], with the label for this outermost loop. The model is then integrated along the full trajectory, and the model state stored to disk only at every $k_{i}^{lev3}$ -th timestep [i.e. 3 times, at corresponding to $k_{i}^{lev3} = 0, 24, 48$ ]. In addition, the cost function is computed, if needed.
: In a second step each subsection itself is divided into ${n}^{lev2}$ subsections [ ${n}^{lev2}$ =4 6-hour intervals per subsection]. The model picks up at the last outermost dumped state $v_{k_{n}^{lev3}}$ and is integrated forward in time along the last subsection, with the label for this intermediate loop. The model state is now stored to disk at every $k_{i}^{lev2}$ -th timestep [i.e. 4 times, at corresponding to $k_{i}^{lev2} = 48, 54, 60, 66$ ].
: Finally, the model picks up at the last intermediate dump state $v_{k_{n}^{lev2}}$ and is integrated forward in time along the last subsection, with the label for this intermediate loop. Within this sub-subsection only, parts of the model state is stored to memory at every timestep [i.e. every hour corresponding to $k_{i}^{lev1} = 66, 67, \ldots, 71$ ]. The final state $v_n = v_{k_{n}^{lev1}}$ is reached and the model state of all preceding timesteps along the last innermost subsection are available, enabling integration backwards in time along the last subsection. The adjoint can thus be computed along this last subsection $k_{n}^{lev2}$ .

This procedure is repeated consecutively for each previous subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2}$ carrying the adjoint computation to the initial time of the subsection $k_{n}^{lev3}$ . Then, the procedure is repeated for the previous subsection $k_{n-1}^{lev3}$ carrying the adjoint computation to the initial time $k_{1}^{lev3}$ .

For the full model trajectory of $n^{lev3} \cdot n^{lev2} \cdot n^{lev1}$ timesteps the required storing of the model state was significantly reduced to $n^{lev2} + n^{lev3}$ to disk and roughly $n^{lev1}$ to memory [i.e. for the 3-day integration with a total oof 72 timesteps the model state was stored 7 times to disk and roughly 6 times to memory]. This saving in memory comes at a cost of a required 3 full forward integrations of the model (one for each checkpointing level). The optimal balance of storage vs. recomputation certainly depends on the computing resources available and may be adjusted by adjusting the partitioning among the $n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1}$ .

**Figure 5.1:** Schematic view of intermediate dump and restart for 3-level checkpointing.
$\resizebox{5.5in}{!}{\includegraphics{s_autodiff/figs/checkpointing.eps}}$

Next: 5.2 TLM and ADM Up: 5.1 Some basic algebra Previous: 5.1.2 Reverse or adjoint Contents

mitgcm-support@mitgcm.org

Last update 2011-01-09