Reflects the changes to the specification:
https://github.com/KhronosGroup/OpenCL-Docs/pull/1318
Relaxed embedded exp and exp2 ulps will be 4 + floor(fabs(2 * x)).
Reciprocal and divide are unchanged because the code already handles the
embedded profile case, see unary_float.c and binary_operator_float.c.
---------
Co-authored-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
A recent improvement to the SPIR-V validator added checks to ensure the
**Float16** capability is declared when directly operating on fp16
values, which identified issues in one of our SPIR-V test files. This PR
fixes the SPIR-V files to add the missing capability.
Adds a basic test for the SPIR-V 1.6 UniformDecoration decorations.
Specifically:
* Tests both the Uniform and UniformId decorations.
* Tests the decorations on constants, function parameters, and
variables.
Both `REGISTER_TEST` and `REQUIRE_EXTENSION` expect cl_device_id
variable but the variable name is inconsistent which makes both macros
unusable together.
This change renames `deviceID` in `REQUIRE_EXTENSION` to `device` to be
consistent with `REGISTER_TEST`.
Signed-off-by: Michael Rizkalla <michael.rizkalla@arm.com>
The `extinst_printf_operands_scalar_int64` test could fail on 32-bit
platforms with `CL_INVALID_ARG_SIZE`, because the helper function was
not guaranteed to be instantiated using a 64-bit integer template type.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Treat reciprocal as a unary function, instead of handling it through the
binary function testing mechanism and special-casing it there.
This addresses two shortcomings of the previous implementation:
- Testing took significantly longer as the entire input domain was
tested many times (e.g. fp16 reciprocal has only 2^16 possible input
values, but binary function testing iterates over 2^16 * 2^16 input
values).
- The reciprocal test kernel was identical to the divide kernel. Thus
the device compiler would see a regular divide operation instead of a
reciprocal operation and would be unlikely to emit a specialized
reciprocal sequence.
This reverts all of the changes in binary_operator*.cpp made by
bcfa1f7c2 ("Added corrections to re-enable reciprocal test in
math_brute_force suite for relaxed math mode (#2221)", 2025-02-04).
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
`semaphores_ooo_ops_cross_queue` uses two OOO command queues to run the
test. In one queue, a semaphore is signalled and the semahpore is waited
on in the other queue.
The CL specification requires the application to synchronize the queues
if objects are shared.
Signed-off-by: Michael Rizkalla <michael.rizkalla@arm.com>
This adds support for allocating DMA buffers on systems that support it,
i.e. Linux and Android.
On mainline Linux, starting version 5.6 (equivalent to Android 12),
there is a new kernel module framework available called [DMA-BUF
Heaps](https://github.com/torvalds/linux/blob/master/drivers/dma-buf/dma-heap.c).
The goal of this framework is to provide a standardised way for user
applications to allocate and share memory buffers between different
devices, subsystems, etc. The main feature of interest is that the
framework provides device-agnostic allocation; it abstracts away the
underlying hardware, and provides a single IOCTL,
`DMA_HEAP_IOCTL_ALLOC`. Mainline implementation provides two heaps that
act as character devices that can allocate DMA buffers; system, which
uses the buddy allocator, and cma, which uses the
[CMA](https://developer.toradex.com/software/linux-resources/linux-features/contiguous-memory-allocator-cma-linux/)
(Contiguous Memory Allocator). Both of these are [kernel configuration
options](https://github.com/torvalds/linux/blob/master/drivers/dma-buf/heaps/Kconfig)
that need to be enabled when building the Linux kernel. Generally, any
kernel module implementing this framework is made available under
/dev/dma_heaps/<heap_name>, e.g. /dev/dma_heaps/system.
The implementation currently only supports one type of DMA heaps;
`system`, the default device path for which is `/dev/dma_heap/system`.
The path can be overridden at runtime using an environment variable,
`OCL_CTS_DMA_HEAP_PATH_SYSTEM`, if needed. Extending this in the future
should be trivial (subject to platform support), by adding an entry to
the enum `dma_buf_heap_type`, and an appropriate default path and
overriding environment variable name.
The proposed implementation will conditionally compile if the conditions
are met (i.e. building for Linux or Android, using kernel headers >=
5.6.0), and will provide a compile-time warning otherwise, and return
`-1` as the DMA handle in runtime if not.
To demonstrate the functionality, a new test is added for the
`cl_khr_external_memory_dma_buf` extension. If the extension is
supported by the device, a DMA buffer will be allocated and used to
create a CL buffer, that is then used by a simple kernel.
This should provide a way forward for adding more tests that depend on
DMA buffers.
---------
Signed-off-by: Gorazd Sumkovski <gorazd.sumkovski@arm.com>
Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Co-authored-by: Gorazd Sumkovski <gorazd.sumkovski@arm.com>
This PR fixes the validation logic for cases where the data type is not
half. Because the variable nan_test is always false, types like float
never trigger a validation failure.
1. Remove duplicate `create_image` code that is in both clFillImage and
clCopyImage test directories.
2. Unify how pitch buffer's memory is deallocated; The buffer can be
allocated with either `malloc` or `align_malloc` and the free function
is pre-set in `pitch_buffe_data`'s member variable `free_fn` and used
when the buffer is deallocated. With this, the change removes
`is_aligned` conditional variable that was used to select the
appropriate free function.
Signed-off-by: Michael Rizkalla <michael.rizkalla@arm.com>
- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the proper
size
- clamp localThreads[2] as for localThreads[0] and localThreads[2]
- clamp all localThreads elements in regard of CL_MAX_WORK_GROUP_SIZE
- fix the size using to create/read the output buffer
Fix#2238
%a or %A with printf on MSVC platforms have a default precision of 13,
which is in contrast to OpenCL C specification for printf which only
allows for exact digits required if precision is not specified.
Add installation rules for all the binary targets.
Targets are installed under `<CMAKE_INSTALL_PREFIX>/bin/<CONFIG>` where
`<CONFIG>` is `CMAKE_BUILD_TYPE` for single-config generators, e.g. Unix
Makefiles and Ninja, or the build configuration for multi-config
generators, e.g. Ninja Multi-Config and Visual Studio.
This creates the target `install` on Unix and `INSTALL` on Windows.
Print the build log when building the program in `get_program_with_il`
fails, to make it easier to investigate spirv_new test failures.
Factor out a global helper function `OutputBuildLog` for printing the
build log for a single device.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Fixes several compile issues I am seeing for my version of Visual Studio
related to an ambiguous call to `fpclassify`, which is called by `isnan`
and other similar functions, specifically for the `cl_half` type:
```
19>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\corecrt_math.h(401,1): error C2668: 'fpclassify': ambiguous call to overloaded function
19>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\corecrt_math.h(298,31): message : could be 'int fpclassify(long double) throw()'
19>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\corecrt_math.h(293,31): message : or 'int fpclassify(double) throw()'
19>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\corecrt_math.h(288,31): message : or 'int fpclassify(float) throw()'
19>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\corecrt_math.h(401,1): message : while trying to match the argument list '(_Ty)'
```
Some of these issues seem like differences in compiler behavior, but at
least one appears to have identified a legitimate bug.
Specifically, this change:
* Removes the special-case checks for finite half numbers for commonfns,
since this is already handled by `UlpFn`. (test with: `test_commonfns
degrees radians`)
* Assigns to temporary variables to eliminate the ambiguous function
call for relationals. (test with: `test_relationals relational*`)
* Properly converts from half to float when checking for NaNs for
select. This is the one that seems like a legitimate bug. (test with:
`test_select select_half_ushort select_half_short`)
* Uses `std::enable_if` to disambiguate a function call for spirv_new.
(test with: `test_spirv_new decorate_saturated*`)
If it's helpful, my specific Visual Studio version is:
```
Microsoft Visual Studio Professional 2019
Version 16.11.20
VisualStudio.16.Release/16.11.20+32929.386
```
I also have the Windows Software Development Kit 10.0.19041.685
installed.
Images with a `CL_UNSIGNED_INT_RAW10_EXT` and
`CL_UNSIGNED_INT_RAW12_EXT` data type are unnormalised, so the
normalised tests with theses images are invalid and will be skipped.
Signed-off-by: Gorazd Sumkovski <gorazd.sumkovski@arm.com>
Signed-off-by: Xin Jin <xin.jin@arm.com>
Co-authored-by: Gorazd Sumkovski <gorazd.sumkovski@arm.com>
fixes#2247
* For the `negative_create_command_buffer_not_supported_properties`
test, the only property we can check for is simultaneous use. All other
properties are part of other extensions and hence will generate
`CL_INVALID_VALUE`, not `CL_INVALID_PROPERTY`.
* Checks whether the `cl_khr_command_buffer_multi_device` extension is
supported when using `CL_COMMAND_BUFFER_DEVICE_SIDE_SYNC_KHR`, instead
of `device_side_enqueue_support`.
* If the `cl_khr_command_buffer_multi_device` extension is NOT supported
and the `CL_COMMAND_BUFFER_DEVICE_SIDE_SYNC_KHR` command buffer creation
flag is used, the expected error code is `CL_INVALID_VALUE`, not
`CL_INVALID_PROPERTY`.
Update cl_khr_command_buffer tests to reflect changes from
https://github.com/KhronosGroup/OpenCL-Docs/pull/1292
* Moves negative test for
`CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR` from
command-buffer creation to enqueue.
* Moves negative test for
`CL_DEVICE_COMMAND_BUFFER_REQUIRED_QUEUE_PROPERTIES_KHR` from
command-buffer creation to enqueue.
* Introduces a negative test for `CL_INVALID_DEVICE` on command-buffer
enqueue for new error condition in spec. Although it requires a context
to be contain more than 1 device, which I'm not sure if possible in
current test framework.
* Introduces a new test that created a command-buffer using a queue
without the profiling property set, then enqueues the command-buffer to
a queue with the profiling property set.
* Introduces a new test that creates a command-buffer with an in-order
queue, enqueued on an out-of-order queue.
* Introduces a new test that creates a command-buffer with an
out-of-order queue, enqueued on an in-order queue.