Small fix, but the ByteBuffer created for pointers allocated with cudaHostAlloc etc should always have native byte order, so that methods like Pointer.getByteBuffer().asLongBuffer() work correctly. Current workaround is to set byte order manually.
Also I think getByteBuffer shouldn’t take any parameters and just return a view of the original buffer of capacity equal to the allocated capacity.
First, the parameters: It sounds like a reasonable proposal to offer a convenience method that just does not take parameters, and returns a slice of the internal byte buffer based on its actual size.
Setting the byte order to LITTLE_ENDIAN sounds reasonable, too. But there’s a caveat: It’s an incompatible change that may break client implementations. Although one can assume that nearly everybody who uses this byte buffer already does something like
LongBuffer b = pointer.getByteBuffer()
.order(ByteOrder.nativeOrder()) // <--- this
.asLongBuffer();
one cannot be sure. Obviously, someone could already do something like this:
long bigEndianValue = pointer.getByteBuffer().asLongBuffer().get(0);
long littleEndianValue = changeEndianness(bigEndianValue);
computeSomethingWith(littleEndianValue);
And changing the endianness here would result in wrong results in the subsequent computations…
One could now argue that everybody who assumed that the returned byte buffer would have a particular order (e.g. big endian in the above example) relied on unspecified behavior until now. So it might be worthwhile to now set this to the native byte order, and specify it once and forever. In any case, I’ll have to prominently point this out in the release notes.
(I have opened https://github.com/jcuda/jcuda/issues/11 to prevent this from getting lost. The upcoming update to CUDA 9 might be a good chance to do these changes. The parameterless method should be fine in any case, but I’ll have to think about the byte order once more…)