(j3.2006) (SC22WG5.3962) [Fwd: Fortran concurrency memory model vs. its C++ counterpart]

Aleksandar Donev donev1 at llnl.gov
Fri Mar 20 18:32:05 EDT 2009


Some answers:

On Friday 20 March 2009 14:57, Van Snyder wrote:
> 1) I couldn't find a clear description of the semantics of a data
> race. 
We do, only the words definition and race are not used, rather, it is a 
bunch of restrictions.

> If I access a coarray element while someone else is writing it, 
> what are the possible outcomes?
It is not allowed to do that. We do not prescribe what happens with 
non-conforming programs---the compiler/RTL can do whatever.

>         c) The program behavior becomes undefined.  This is the
> approach taken by Posix and C++0x.  Mostly allows existing
> optimizations.
Yes, this is our choice too, though I do not know what Posix does 
exactly.

> 2) Presumably the intent is to prohibit the compiler from introducing
> new "speculative" stores to co-arrays that may add data races?
Correct, such "harmless" "optimizations" (in a serial context), can 
cause problems with shared data and compilers must disable those. We 
don't say such things explicitly in Fortran---the compilers need to do 
the right thing to produce the correct answer for any conforming 
program. If they do the above, they are in error.

> C++ and C are expected to outlaw this, with exceptions
> for sequences of contiguous bit-fields (which I assume don't exist in
> Fortran?)  
They don't, yet :-)

> 3) Fortran relies at least superficially on explicit memory fences
> ("sync_memory") to provide memory ordering guarantees.  C++0x and
> Java instead provide a "sequential consistency for data-race-free
> programs" guarantee by default, with some esoteric constructs to
> defeat that for performance. 
By contrast, Fortran's default is performance and we have "some esoteric 
constructs" to ensure some form of sequential consistency.

> (On X86, with shared memory, I'd expect a factor
> of 10 or so difference between bracketing every atomic access with
> sync_memory vs. the C++ approach. 
Yes, but what is the cost of the implementation ensuring 
sequential-consistency on, say, a cluster with 1000 nodes???

We hope not to see coarray codes with loads of sync memories. Use 
another language for that---coarrays are meant to cover coarse-grained 
parallelism for the most part. We don't even provide real atomic 
features, just a simple load and store!

>  (For example, you get into subtle issues as to whether data
> dependencies are required to enforce memory ordering.  Programmers
> tend to automatically assume yes.  Implementers tend to automatically
> assume no.)
These are indeed tricky, and we have argued over them at length. 
However, again, the hope is that most (Fortran---recall that these tend 
to be different from people such as yourselves!) programmers will not 
dwelve into such depths.

> I am actually hoping that hardware implementations gradually favor
> the C ++/Java approach even more
Perhaps for shared-memory one-machine type hardware. Certainly it won't 
happen for large clusters anytime soon, will it?

> 4) I'm not sure what, if anything, volatile means with respect to
> concurrent access by multiple images. 
Neither do we, and I do not care to know :-) I am not trying to be 
annoying, it is just a question I have had to discuss on this list 
waaaay too many times for no good reason or outcome.

Probably, implementations will follow the C model, whatever that happens 
to be. The standard itself will defer this as a processor-dependent 
issue....I hope.

Best,
Aleks



More information about the J3 mailing list