(j3.2006) (SC22WG5.3962) [Fwd: Fortran concurrency memory model vs. its C++ counterpart]
Aleksandar Donev
donev1 at llnl.gov
Fri Mar 20 18:32:05 EDT 2009
Some answers:
On Friday 20 March 2009 14:57, Van Snyder wrote:
> 1) I couldn't find a clear description of the semantics of a data
> race.
We do, only the words definition and race are not used, rather, it is a
bunch of restrictions.
> If I access a coarray element while someone else is writing it,
> what are the possible outcomes?
It is not allowed to do that. We do not prescribe what happens with
non-conforming programs---the compiler/RTL can do whatever.
> c) The program behavior becomes undefined. This is the
> approach taken by Posix and C++0x. Mostly allows existing
> optimizations.
Yes, this is our choice too, though I do not know what Posix does
exactly.
> 2) Presumably the intent is to prohibit the compiler from introducing
> new "speculative" stores to co-arrays that may add data races?
Correct, such "harmless" "optimizations" (in a serial context), can
cause problems with shared data and compilers must disable those. We
don't say such things explicitly in Fortran---the compilers need to do
the right thing to produce the correct answer for any conforming
program. If they do the above, they are in error.
> C++ and C are expected to outlaw this, with exceptions
> for sequences of contiguous bit-fields (which I assume don't exist in
> Fortran?)
They don't, yet :-)
> 3) Fortran relies at least superficially on explicit memory fences
> ("sync_memory") to provide memory ordering guarantees. C++0x and
> Java instead provide a "sequential consistency for data-race-free
> programs" guarantee by default, with some esoteric constructs to
> defeat that for performance.
By contrast, Fortran's default is performance and we have "some esoteric
constructs" to ensure some form of sequential consistency.
> (On X86, with shared memory, I'd expect a factor
> of 10 or so difference between bracketing every atomic access with
> sync_memory vs. the C++ approach.
Yes, but what is the cost of the implementation ensuring
sequential-consistency on, say, a cluster with 1000 nodes???
We hope not to see coarray codes with loads of sync memories. Use
another language for that---coarrays are meant to cover coarse-grained
parallelism for the most part. We don't even provide real atomic
features, just a simple load and store!
> (For example, you get into subtle issues as to whether data
> dependencies are required to enforce memory ordering. Programmers
> tend to automatically assume yes. Implementers tend to automatically
> assume no.)
These are indeed tricky, and we have argued over them at length.
However, again, the hope is that most (Fortran---recall that these tend
to be different from people such as yourselves!) programmers will not
dwelve into such depths.
> I am actually hoping that hardware implementations gradually favor
> the C ++/Java approach even more
Perhaps for shared-memory one-machine type hardware. Certainly it won't
happen for large clusters anytime soon, will it?
> 4) I'm not sure what, if anything, volatile means with respect to
> concurrent access by multiple images.
Neither do we, and I do not care to know :-) I am not trying to be
annoying, it is just a question I have had to discuss on this list
waaaay too many times for no good reason or outcome.
Probably, implementations will follow the C model, whatever that happens
to be. The standard itself will defer this as a processor-dependent
issue....I hope.
Best,
Aleks
More information about the J3
mailing list