RWCP OpenMP compiler project

Cluster-enabled Omni OpenMP on a software distributed shared memory system SCASH.
Current status and restriction, known problems and bugs

Current Status

In Omni 1.3, the status is out of beta state.

The home node of large array objects is allocated in "block" distribution. We extends the directives to specify the data mapping and loop scheduling. For this extensions, see "Omni extensions for for Cluster-enabled OpenMP/SCASH".

The SCASH system in SCore 3.3.2 supports SMP clusters, which consist of shared memory multi-processor nodes. Multiple processors in a SMP node can share the data through shared memory hardware. For example, suppose the group host "smpc" have 4 dual processor nodes,

        % scout -g smpc
        % scrun -nodes=4x4 a.out

"-nodes=4x4" specifies using 4 processors in each 4 nodes.

Omni/SCASH can correctly execute OpenMP programs with barrier synchronization, which are usually found in data-parallel programs. We still have the following problems remains.

In SMP nodes, flush directives in an thread running independently may fail to update the last value which the processor have written, because other processors in the same SMP nodes may update with the value from different nodes. This problem is due to the fact that there is no way to write-protect the pages from other processors in the same SMP in Linux.

These may cause a serious problem on OpenMP programs running in SPMD models. We are currently thinking a new synchronization primitive specific to Omni/SCASH to solve the above problem and for fast synchronization in SCASH environment.

Restrictions and Known problems in SCASH implementation

Although the Omni OpenMP for SCASH is almost compatible to OpenMP for SMPs, there are a few restrictions and incompatibility.

The variables declared in external libraries must be declared as "threadprivate". For example in C, "stderr" in the C standard I/O library must be declared as "threadprivate". Global variables without "threadprivate" are re-allocated in a share address space in SCASH.
System calls (such as read/write) which access in SCASH shared address space directly may cause "segmentation faults". To prevent this problem, touch in the uninitialized shared memory space before calling the system call (for example, use "bzero" to initialize).

In SCASH, modified data is updated explicitly at a memory synchronization point such as barrier. For example, the following code may cause dead lock.

int counter = 0;

main ()
{
  #pragma omp parallel
  {
    int node_id = omp_get_thread_num ();

    if (node_id != 0) {
      while(counter == 0) {
        #pragma omp flush
      }
    } else {
      counter = 1;
    }
    /* "count" is not updated until the barrier below.
     * The threads waiting update of "count" wait forever,
     * resulting in dead lock.
     */
    #pragma omp barrier
    ;
  }
}

Only local variables declared in the function in which a parallel region is defined are shared.
For example in C, the following code works correctly,

  foo(){
    int x; /* local variable, which can shared by the parallel region */
    ...
 #pragma omp parallel 
    {
      .. = x; /* OK */
    }
  }

But the following code may cause an error:

  foo(){
    int x; /* local variable, which can shared by the parallel region */
    ...
    goo(&x);
  }
 goo(int *p){
 #pragma omp parallel 
    {
      .. = *p; /* NG */
    }
  }

The variable 'x' may be declared as "static" to be shared.

If a local variable are shared, it is copied into shared memory space at the beginning of parallel region. When the size of the variable is large, it may cause large overhead.
Dynamically allocated heap by allocated by "malloc" is not shared. Use "ompsm_galloc" to allocate memory in the shared memory space. For the detail interface, see Omni/SCASH shared memory allocator "ompsm_galloc" .
Large local variables may cause an error because of the limitation of the stack size in SCASH.
In a cluster environment, I/O occurs independently at each node. File pointers are not shared, different from SMP's environment.
The number of processors are given by "scrun", not "OMP_NUM_THREAD".

For the details, also refer to "Implementation Notes".

Cluster-enabled Omni OpenMP on a software distributed shared memory system SCASH. Current status and restriction, known problems and bugs

Current Status

Restrictions and Known problems in SCASH implementation

Cluster-enabled Omni OpenMP on a software distributed shared memory system SCASH.
Current status and restriction, known problems and bugs