Next: 4.2.7 Distributed memory communication
Up: 4.2 WRAPPER
Previous: 4.2.5 Communication mechanisms
Contents
Subsections
4.2.6 Shared memory communication
Under shared communication independent CPUs are operating on the
exact same global address space at the application level. This means
that CPU 1 can directly write into global data structures that CPU 2
``owns'' using a simple assignment at the application level. This is
the model of memory access is supported at the basic system design
level in ``shared-memory'' systems such as PVP systems, SMP systems,
and on distributed shared memory systems (eg. SGI Origin, SGI
Altix, and some AMD Opteron systems). On such systems the WRAPPER
will generally use simple read and write statements to access directly
application data structures when communicating between CPUs.
In a system where assignments statements, like the one in figure
4.5 map directly to hardware instructions that
transport data between CPU and memory banks, this can be a very
efficient mechanism for communication. In this case two CPUs, CPU1
and CPU2, can communicate simply be reading and writing to an agreed
location and following a few basic rules. The latency of this sort of
communication is generally not that much higher than the hardware
latency of other memory accesses on the system. The bandwidth
available between CPUs communicating in this way can be close to the
bandwidth of the systems main-memory interconnect. This can make this
method of communication very efficient provided it is used
appropriately.
4.2.6.1 Memory consistency
When using shared memory communication between multiple processors the
WRAPPER level shields user applications from certain counter-intuitive
system behaviors. In particular, one issue the WRAPPER layer must
deal with is a systems memory model. In general the order of reads
and writes expressed by the textual order of an application code may
not be the ordering of instructions executed by the processor
performing the application. The processor performing the application
instructions will always operate so that, for the application
instructions the processor is executing, any reordering is not
apparent. However, in general machines are often designed so that
reordering of instructions is not hidden from other second processors.
This means that, in general, even on a shared memory system two
processors can observe inconsistent memory values.
The issue of memory consistency between multiple processors is
discussed at length in many computer science papers. From a practical
point of view, in order to deal with this issue, shared memory
machines all provide some mechanism to enforce memory consistency when
it is needed. The exact mechanism employed will vary between systems.
For communication using shared memory, the WRAPPER provides a place to
invoke the appropriate mechanism to ensure memory consistency for a
particular platform.
4.2.6.2 Cache effects and false sharing
Shared-memory machines often have local to processor memory caches
which contain mirrored copies of main memory. Automatic cache-coherence
protocols are used to maintain consistency between caches on different
processors. These cache-coherence protocols typically enforce consistency
between regions of memory with large granularity (typically 128 or 256 byte
chunks). The coherency protocols employed can be expensive relative to other
memory accesses and so care is taken in the WRAPPER (by padding synchronization
structures appropriately) to avoid unnecessary coherence traffic.
Applications running under multiple threads within a single process
can use shared memory communication. In this case all the
memory locations in an application are potentially visible to all the
compute threads. Multiple threads operating within a single process is
the standard mechanism for supporting shared memory that the WRAPPER
utilizes. Configuring and launching code to run in multi-threaded mode
on specific platforms is discussed in section
4.3.2.1. However, on many systems,
potentially very efficient mechanisms for using shared memory
communication between multiple processes (in contrast to multiple
threads within a single process) also exist. In most cases this works
by making a limited region of memory shared between processes. The
MMAP and IPC facilities in UNIX
systems provide this capability as do vendor specific tools like LAPI
and IMC. Extensions exist for the
WRAPPER that allow these mechanisms to be used for shared memory
communication. However, these mechanisms are not distributed with the
default WRAPPER sources, because of their proprietary nature.
Next: 4.2.7 Distributed memory communication
Up: 4.2 WRAPPER
Previous: 4.2.5 Communication mechanisms
Contents
mitgcm-support@mitgcm.org
Copyright © 2006
Massachusetts Institute of Technology |
Last update 2011-01-09 |
|
|