4.2.6 Shared memory communication

About

Installation

Tutorials

Documentation

1.1 Introduction

1.3 Continuous equations in `r' coordinates

2. Discretization and Algorithm

4. Software Architecture

5. Automatic Differentiation

6. Physical Parameterization and Packages

7. Diagnostics and tools

8. Interface with ECCO

Browse Code

PDF file (9Mb)

PS file (7Mb)

Next: 4.2.7 Distributed memory communication Up: 4.2 WRAPPER Previous: 4.2.5 Communication mechanisms Contents

Subsections

4.2.6 Shared memory communication

Under shared communication independent CPUs are operating on the exact same global address space at the application level. This means that CPU 1 can directly write into global data structures that CPU 2 ``owns'' using a simple assignment at the application level. This is the model of memory access is supported at the basic system design level in ``shared-memory'' systems such as PVP systems, SMP systems, and on distributed shared memory systems (eg. SGI Origin, SGI Altix, and some AMD Opteron systems). On such systems the WRAPPER will generally use simple read and write statements to access directly application data structures when communicating between CPUs.

In a system where assignments statements, like the one in figure 4.5 map directly to hardware instructions that transport data between CPU and memory banks, this can be a very efficient mechanism for communication. In this case two CPUs, CPU1 and CPU2, can communicate simply be reading and writing to an agreed location and following a few basic rules. The latency of this sort of communication is generally not that much higher than the hardware latency of other memory accesses on the system. The bandwidth available between CPUs communicating in this way can be close to the bandwidth of the systems main-memory interconnect. This can make this method of communication very efficient provided it is used appropriately.

4.2.6.1 Memory consistency

When using shared memory communication between multiple processors the WRAPPER level shields user applications from certain counter-intuitive system behaviors. In particular, one issue the WRAPPER layer must deal with is a systems memory model. In general the order of reads and writes expressed by the textual order of an application code may not be the ordering of instructions executed by the processor performing the application. The processor performing the application instructions will always operate so that, for the application instructions the processor is executing, any reordering is not apparent. However, in general machines are often designed so that reordering of instructions is not hidden from other second processors. This means that, in general, even on a shared memory system two processors can observe inconsistent memory values.

The issue of memory consistency between multiple processors is discussed at length in many computer science papers. From a practical point of view, in order to deal with this issue, shared memory machines all provide some mechanism to enforce memory consistency when it is needed. The exact mechanism employed will vary between systems. For communication using shared memory, the WRAPPER provides a place to invoke the appropriate mechanism to ensure memory consistency for a particular platform.

4.2.6.2 Cache effects and false sharing

Shared-memory machines often have local to processor memory caches which contain mirrored copies of main memory. Automatic cache-coherence protocols are used to maintain consistency between caches on different processors. These cache-coherence protocols typically enforce consistency between regions of memory with large granularity (typically 128 or 256 byte chunks). The coherency protocols employed can be expensive relative to other memory accesses and so care is taken in the WRAPPER (by padding synchronization structures appropriately) to avoid unnecessary coherence traffic.

4.2.6.3 Operating system support for shared memory.

Applications running under multiple threads within a single process can use shared memory communication. In this case all the memory locations in an application are potentially visible to all the compute threads. Multiple threads operating within a single process is the standard mechanism for supporting shared memory that the WRAPPER utilizes. Configuring and launching code to run in multi-threaded mode on specific platforms is discussed in section 4.3.2.1. However, on many systems, potentially very efficient mechanisms for using shared memory communication between multiple processes (in contrast to multiple threads within a single process) also exist. In most cases this works by making a limited region of memory shared between processes. The MMAP and IPC facilities in UNIX systems provide this capability as do vendor specific tools like LAPI and IMC. Extensions exist for the WRAPPER that allow these mechanisms to be used for shared memory communication. However, these mechanisms are not distributed with the default WRAPPER sources, because of their proprietary nature.

Next: 4.2.7 Distributed memory communication Up: 4.2 WRAPPER Previous: 4.2.5 Communication mechanisms Contents

mitgcm-support@mitgcm.org

Last update 2018-01-23