* OpenCL C (and SPIR-V) require that the failure memory order is not
stronger than the success memory order.
Also see Khronos internal memory model issue #181
CC @bashbaug
Related to https://github.com/KhronosGroup/OpenCL-CTS/issues/2142,
according to the work plan, extending CBasicTestFetchAdd with support
for atomic_half.
I wasn't able to test that PR completely due to missing
`CL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT`/`CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT`
capabilities for atomic_half. I appreciate reviewers' attention, thanks.
Related to #2142, according to the work plan, extending
CBasicTestFetchSub with support for atomic_half.
I wasn't able to test that PR entirely due to missing
CL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT/CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
capabilities for atomic_half. I appreciate reviewers' attention, thanks.
Related to #2142, according to the work plan extended `CBasicTestStore`
with support for `atomic_half`.
Optimization remark: in tests related to `CBasicTestStore` kernel source
code is mostly composed with arguments following similar pattern:
`__kernel void test_atomic_kernel(uint threadCount, uint numDestItems,
__global int *finalDest, __global int *oldValues, volatile __local
atomic_int *destMemory)`
`oldValues` buffer is initialized with a host pointer, after kernel
execution it is read back to the host pointer but it is unused in
neither of the kernels I verified.
Move the global `-Wno-format` compiler option to the individual tests
that still trigger Wformat warnings. The majority of the tests now
compile cleanly with `-Wformat` enabled.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Some `HOST_` types were declared with signedness different from the
`HOST_ATOMIC_` counterparts, leading to sign-compare warnings when
comparing between types. Fix by aligning the signedness.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Initialize the `_memoryOrder` and `_memoryScope` members to avoid
`CBasicTestMemOrderScope::MaxHostThreads()` accessing uninitialized
data.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Only disable `-Wsign-compare` for tests that do not compile cleanly
with this warning enabled. Re-enable the warning for the other tests,
so that it can catch any new occurrences.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Names that begin with an underscore followed by an uppercase letter
are reserved for the C++ implementation.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Fix a few instances where an incorrect number of arguments was
supplied when calling (v)log_error.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
* Fix local memory out of bounds in atomic_fence
In the error condition, the atomic_fence kernel can illegally access local memory addresses.
In this snippet, localValues is in the local address space and provided as a kernel argument. Its size is effectively get_local_size(0) * sizeof(int). The stores to localValues lead to OoB accesses.
size_t myId = get_local_id(0);
...
if(hisAtomicValue != hisValue)
{ // fail
atomic_store(&destMemory[myId], myValue-1);
hisId = (hisId+get_local_size(0)-1)%get_local_size(0);
if(myValue+1 < 1)
localValues[myId*1+myValue+1] = hisId;
if(myValue+2 < 1)
localValues[myId*1+myValue+2] = hisAtomicValue;
if(myValue+3 < 1)
localValues[myId*1+myValue+3] = hisValue;
}
* Fix formatting
* Fix formatting again
* Formatting
* Change memory order and scope for atomics that gate final results being stored.
memory_order_acq_rel with memory_scope_device is now used to guarantee that the correct memory consistency is observed before final results are stored.
Previously it was possible for kernels to be generated that all used relaxed memory ordering, which could lead to false-positive failures.
Fixes#1370
* Disable atomics tests with global, in-program atomics.
If the device does not support `memory_order_relaxed` or `memory_scope_device`, disable atomics tests that declare their atomics in-program with global memory.
There is now an implicit requirement to support `memory_order_relaxed` and `memory_scope_device` for these tests.
* Fix misplaced parentheses.
* Change memory scope for atomic fetch and load calls in kernel
Change the memory scope from memory_scope_work_group to
memory_scope_device so the ordering applies across all work items
Co-authored-by: Sreelakshmi Haridas <sharidas@quicinc.com>
* Fix memory model issue in atomic_flag.
In atomic_flag sub-tests that modify local memory, compilers may re-order memory accesses between the local and global address spaces which can lead to incorrect test failures.
This commit ensures that both local and global memory operations are fenced to prevent this re-ordering from occurring.
Fixes#134.
* Clang format changes.
* Added missing global acquire which is necessary for the corresponding global release.
Thanks to @jlewis-austin for spotting.
* Clang format changes.
* Match the condition for applying acquire/release fences.
* Add memory_scope_all_devices testing
This duplicats memory_scope_all_svm_devices testing, but it seems pretty quick
so I don't think it hurts.
Fixes#990
* Address clang-format failures
* Address a further clang-format failure
* The `atomic_flag` test assumes support for the `atomic_scope_device`
scope in the global scope test case. Since `atomic_scope_device` is
optional on an OpenCL-3.0 driver, this test should check for support and
skip otherwise.
The generator function for atomic_fence uses the current workgroup
size to determine the number of non atomic variables per thread
in the kernel. The kernel should hence be regenerated when the
workgroup size changes.
However regenerating the kernel can itself change the workgroup
size. This change introduces an iterative loop that reduces the
workgroup sizes by 2 each time re-generating the kernel until
we find one that works (or exit at groupsize == 1)
Change-Id: Ic32fe967e32de6643e01c6775f4bddbcad0a299a
* Change setup code in `KernelCode()` to use `_explicit` builtin
variants that are common to both OpenCL-2.X and OpenCL-3.0.
* Only test optional supported builtin variants (`_explicit` signature
memory_order/scope) for OpenCL-3.0.
* Disable program scope global variable and generic address space tests
for a OpenCL-3.0 driver which does not optionally support these
features.
* Test minimum memory consistency capabilities for a device reporting >= 3.0.
Skip tests where unsupported memory consistency capabilities are being requested.
* Pass nullptr as program build option.
Allows the CTS framework to select an appropriate CL C version.
* Removed redundant braces.
The spec states that the minimum amount of local memory for embedded devices is 1KB. This change clamps work group sizes to 1024 for embedded devices, and sets the number of local variables per thread to 1.
Fixes#690.