Home Contact Us Site Map  
 
       
    next up previous contents
Next: 5.5.2 Recipe 1: single Up: 5.5 Adjoint dump & Previous: 5.5 Adjoint dump &   Contents

5.5.1 Introduction

Most high performance computing (HPC) centres require the use of batch jobs for code execution. Limits in maximum available CPU time and memory may prevent the adjoint code execution from fitting into any of the available queues. This presents a serious limit for large scale / long time adjoint ocean and climate model integrations. The MITgcm itself enables the split of the total model integration into sub-intervals through standard dump/restart of/from the full model state. For a similar procedure to run in reverse mode, the adjoint model requires, in addition to the model state, the adjoint model state, i.e. all variables with derivative information which are needed in an adjoint restart. This adjoint dump & restart is also termed 'divided adjoint (DIVA).

For this to work in conjunction with automatic differentiation, an AD tool needs to perform the following tasks:

  1. identify an adjoint state, i.e. those sensitivities whose accumulation is interrupted by a dump/restart and which influence the outcome of the gradient. Ideally, this state consists of
    • the adjoint of the model state,
    • the adjoint of other intermediate results (such as control variables, cost function contributions, etc.)
    • bookkeeping indices (such as loop indices, etc.)
  2. generate code for storing and reading adjoint state variables
  3. generate code for bookkeeping , i.e. maintaining a file with index information
  4. generate a suitable adjoint loop to propagate adjoint values for dump/restart with a minimum overhad of adjoint intermediate values.

TAF (but not TAMC!) generates adjoint code which performs the above specified tasks. It is closely tied to the adjoint multi-level checkpointing. The adjoint state is dumped (and restarted) at each step of the outermost checkpointing level and adjoint intergration is performed over one outermost checkpointing interval. Prior to the adjoint computations, a full foward sweep is performed to generate the outermost (forward state) tapes and to calculate the cost function. In the current implementation, the forward sweep is immediately followed by the first adjoint leg. Thus, in theory, the following steps are performed (automatically)

  • 1st model call:
    This is the case if file costfinal does not exist. S/R mdthe_main_loop is called.
    1. calculate forward trajectory and dump model state after each outermost checkpointing interval to files tapelev3
    2. calculate cost function fc and write it to file costfinal
  • 2nd and all remaining model call:
    This is the case if file costfinal does exist. S/R adthe_main_loop is called.
    1. (forward run and cost function call is avoided since all values are known)
      • if 1st adjoint leg:
        create index file divided.ctrl which contains info on current checkpointing index $ ilev3$
      • if not $ i$-th adjoint leg:
        adjoint picks up at $ ilev3 = nlev3-i+1$ and runs to $ nlev3 - i$
    2. perform adjoint leg from $ nlev3-i+1$ to $ nlev3 - i$
    3. dump adjoint state to file snapshot
    4. dump index file divided.ctrl for next adjoint leg
    5. in the last step the gradient is written.

A few modififications were performed in the forward code, obvious ones such as adding the corresponding TAF-directive at the appropriate place, and less obvious ones (avoid some re-initializations, when in an intermediate adjoint integration interval).

[For TAF-1.4.20 a number of hand-modifications were necessary to compensate for TAF bugs. Since we refer to TAF-1.4.26 onwards, these modifications are not documented here].


next up previous contents
Next: 5.5.2 Recipe 1: single Up: 5.5 Adjoint dump & Previous: 5.5 Adjoint dump &   Contents
mitgcm-support@dev.mitgcm.org
Copyright © 2002 Massachusetts Institute of Technology