Next: 4.2.7 Distributed memory communication
Up: 4.2 WRAPPER
Previous: 4.2.5 Communication mechanisms
Contents
Subsections
4.2.6 Shared memory communication
Under shared communication independent CPU's are operating
on the exact same global address space at the application level.
This means that CPU 1 can directly write into global
data structures that CPU 2 ``owns'' using a simple
assignment at the application level.
This is the model of memory access is supported at the basic system
design level in ``shared-memory'' systems such as PVP systems, SMP systems,
and on distributed shared memory systems (the SGI Origin).
On such systems the WRAPPER will generally use simple read and write statements
to access directly application data structures when communicating between CPU's.
In a system where assignments statements, like the one in figure
4.5 map directly to
hardware instructions that transport data between CPU and memory banks, this
can be a very efficient mechanism for communication. In this case two CPU's,
CPU1 and CPU2, can communicate simply be reading and writing to an
agreed location and following a few basic rules. The latency of this sort
of communication is generally not that much higher than the hardware
latency of other memory accesses on the system. The bandwidth available
between CPU's communicating in this way can be close to the bandwidth of
the systems main-memory interconnect. This can make this method of
communication very efficient provided it is used appropriately.
4.2.6.1 Memory consistency
When using shared memory communication between
multiple processors the WRAPPER level shields user applications from
certain counter-intuitive system behaviors. In particular, one issue the
WRAPPER layer must deal with is a systems memory model. In general the order
of reads and writes expressed by the textual order of an application code may
not be the ordering of instructions executed by the processor performing the
application. The processor performing the application instructions will always
operate so that, for the application instructions the processor is executing,
any reordering is not apparent. However, in general machines are often
designed so that reordering of instructions is not hidden from other second
processors. This means that, in general, even on a shared memory system two
processors can observe inconsistent memory values.
The issue of memory consistency between multiple processors is discussed at
length in many computer science papers, however, from a practical point of
view, in order to deal with this issue, shared memory machines all provide
some mechanism to enforce memory consistency when it is needed. The exact
mechanism employed will vary between systems. For communication using shared
memory, the WRAPPER provides a place to invoke the appropriate mechanism to
ensure memory consistency for a particular platform.
4.2.6.2 Cache effects and false sharing
Shared-memory machines often have local to processor memory caches
which contain mirrored copies of main memory. Automatic cache-coherence
protocols are used to maintain consistency between caches on different
processors. These cache-coherence protocols typically enforce consistency
between regions of memory with large granularity (typically 128 or 256 byte
chunks). The coherency protocols employed can be expensive relative to other
memory accesses and so care is taken in the WRAPPER (by padding synchronization
structures appropriately) to avoid unnecessary coherence traffic.
Applications running under multiple threads within a single process can
use shared memory communication. In this case all the memory locations
in an application are potentially visible to all the compute threads. Multiple
threads operating within a single process is the standard mechanism for
supporting shared memory that the WRAPPER utilizes. Configuring and launching
code to run in multi-threaded mode on specific platforms is discussed in
section . However, on many systems, potentially
very efficient mechanisms for using shared memory communication between
multiple processes (in contrast to multiple threads within a single
process) also exist. In most cases this works by making a limited region of
memory shared between processes. The MMAP and
IPC facilities in UNIX systems provide this capability as do
vendor specific tools like LAPI and IMC .
Extensions exist for the WRAPPER that allow these mechanisms
to be used for shared memory communication. However, these mechanisms are not
distributed with the default WRAPPER sources, because of their proprietary
nature.
Next: 4.2.7 Distributed memory communication
Up: 4.2 WRAPPER
Previous: 4.2.5 Communication mechanisms
Contents
mitgcm-support@dev.mitgcm.org
Copyright © 2002
Massachusetts Institute of Technology |
|
|