Pointer.buffer should have ByteOrder.LITTLE_ENDIAN

Small fix, but the ByteBuffer created for pointers allocated with cudaHostAlloc etc should always have native byte order, so that methods like Pointer.getByteBuffer().asLongBuffer() work correctly. Current workaround is to set byte order manually.

Also I think getByteBuffer shouldn’t take any parameters and just return a view of the original buffer of capacity equal to the allocated capacity.

First, the parameters: It sounds like a reasonable proposal to offer a convenience method that just does not take parameters, and returns a slice of the internal byte buffer based on its actual size.

Setting the byte order to LITTLE_ENDIAN sounds reasonable, too. But there’s a caveat: It’s an incompatible change that may break client implementations. Although one can assume that nearly everybody who uses this byte buffer already does something like

LongBuffer b = pointer.getByteBuffer()
    .order(ByteOrder.nativeOrder())      // <--- this
    .asLongBuffer();

one cannot be sure. Obviously, someone could already do something like this:

long bigEndianValue = pointer.getByteBuffer().asLongBuffer().get(0);
long littleEndianValue = changeEndianness(bigEndianValue);
computeSomethingWith(littleEndianValue);

And changing the endianness here would result in wrong results in the subsequent computations…

One could now argue that everybody who assumed that the returned byte buffer would have a particular order (e.g. big endian in the above example) relied on unspecified behavior until now. So it might be worthwhile to now set this to the native byte order, and specify it once and forever. In any case, I’ll have to prominently point this out in the release notes.

(I have opened https://github.com/jcuda/jcuda/issues/11 to prevent this from getting lost. The upcoming update to CUDA 9 might be a good chance to do these changes. The parameterless method should be fine in any case, but I’ll have to think about the byte order once more…)

This is probably one of those API warts that shouldn’t ever be changed.

But, you could create getByteBuffer() and return native byte order, and deprecate without changing getByteBuffer(…) since it’s superfluous.

getByteBuffer() could be thread-safe if it sliced its buffer before setting the position.

Casting with (int) is a big mistake. Use Math.toIntExact for an easy assertion.

        ByteBuffer byteBuffer = ((ByteBuffer)buffer).slice();
        byteBuffer.limit(Math.toIntExact(byteOffset + byteSize));
        byteBuffer.position(Math.toIntExact(byteOffset));

I don’t consider the current method as “superfluous”. Sure, technically

p.getByteBuffer(offset, length)

and

p.getByteBuffer().position(offset).limit(offset+length);

are equivalent, but the current form was supposed to be a trade off of being generic and convenient at the same time.

Nevertheless, the points that you mentioned are valid and important, and I have added this information in the issue description.

The update for this is in https://github.com/jcuda/jcuda/commit/90edfafe90318081ea5f21bb6888eb7f068f901f and will be part of the next release.