* OpenCL C (and SPIR-V) require that the failure memory order is not
stronger than the success memory order.
Also see Khronos internal memory model issue #181
CC @bashbaug
Related to #2142, according to the work plan extended `CBasicTestStore`
with support for `atomic_half`.
Optimization remark: in tests related to `CBasicTestStore` kernel source
code is mostly composed with arguments following similar pattern:
`__kernel void test_atomic_kernel(uint threadCount, uint numDestItems,
__global int *finalDest, __global int *oldValues, volatile __local
atomic_int *destMemory)`
`oldValues` buffer is initialized with a host pointer, after kernel
execution it is read back to the host pointer but it is unused in
neither of the kernels I verified.
Initialize the `_memoryOrder` and `_memoryScope` members to avoid
`CBasicTestMemOrderScope::MaxHostThreads()` accessing uninitialized
data.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Names that begin with an underscore followed by an uppercase letter
are reserved for the C++ implementation.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
* Fix local memory out of bounds in atomic_fence
In the error condition, the atomic_fence kernel can illegally access local memory addresses.
In this snippet, localValues is in the local address space and provided as a kernel argument. Its size is effectively get_local_size(0) * sizeof(int). The stores to localValues lead to OoB accesses.
size_t myId = get_local_id(0);
...
if(hisAtomicValue != hisValue)
{ // fail
atomic_store(&destMemory[myId], myValue-1);
hisId = (hisId+get_local_size(0)-1)%get_local_size(0);
if(myValue+1 < 1)
localValues[myId*1+myValue+1] = hisId;
if(myValue+2 < 1)
localValues[myId*1+myValue+2] = hisAtomicValue;
if(myValue+3 < 1)
localValues[myId*1+myValue+3] = hisValue;
}
* Fix formatting
* Fix formatting again
* Formatting
* Change memory order and scope for atomics that gate final results being stored.
memory_order_acq_rel with memory_scope_device is now used to guarantee that the correct memory consistency is observed before final results are stored.
Previously it was possible for kernels to be generated that all used relaxed memory ordering, which could lead to false-positive failures.
Fixes#1370
* Disable atomics tests with global, in-program atomics.
If the device does not support `memory_order_relaxed` or `memory_scope_device`, disable atomics tests that declare their atomics in-program with global memory.
There is now an implicit requirement to support `memory_order_relaxed` and `memory_scope_device` for these tests.
* Fix misplaced parentheses.
* Change memory scope for atomic fetch and load calls in kernel
Change the memory scope from memory_scope_work_group to
memory_scope_device so the ordering applies across all work items
Co-authored-by: Sreelakshmi Haridas <sharidas@quicinc.com>
* Add memory_scope_all_devices testing
This duplicats memory_scope_all_svm_devices testing, but it seems pretty quick
so I don't think it hurts.
Fixes#990
* Address clang-format failures
* Address a further clang-format failure
The generator function for atomic_fence uses the current workgroup
size to determine the number of non atomic variables per thread
in the kernel. The kernel should hence be regenerated when the
workgroup size changes.
However regenerating the kernel can itself change the workgroup
size. This change introduces an iterative loop that reduces the
workgroup sizes by 2 each time re-generating the kernel until
we find one that works (or exit at groupsize == 1)
Change-Id: Ic32fe967e32de6643e01c6775f4bddbcad0a299a
* Change setup code in `KernelCode()` to use `_explicit` builtin
variants that are common to both OpenCL-2.X and OpenCL-3.0.
* Only test optional supported builtin variants (`_explicit` signature
memory_order/scope) for OpenCL-3.0.
* Disable program scope global variable and generic address space tests
for a OpenCL-3.0 driver which does not optionally support these
features.
* Test minimum memory consistency capabilities for a device reporting >= 3.0.
Skip tests where unsupported memory consistency capabilities are being requested.
* Pass nullptr as program build option.
Allows the CTS framework to select an appropriate CL C version.
* Removed redundant braces.