mirror of
https://github.com/KhronosGroup/OpenCL-CTS.git
synced 2026-03-19 06:09:01 +00:00
* allocations: Move results array from stack to heap (#1857) * allocations: Fix stack overflow * check format fixes * Fix windows stack overflow. (#1839) * thread_dimensions: Avoid combinations of very small LWS and very large GWS (#1856) Modify the existing condition to include extremely small LWS like 1x1 on large GWS values * c11_atomics: Reduce the loopcounter for sequential consistency tests (#1853) Reduce the loop from 1000000 to 500000 since the former value makes the test run too long and cause system issues on certain platforms * Limit individual allocation size using the global memory size (#1835) Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com> * geometrics: fix Wsign-compare warnings (#1855) Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * integer_ops: fix -Wformat warnings (#1860) The main sources of warnings were: * Printing of a `size_t` which requires the `%zu` specifier. * Printing of `cl_long`/`cl_ulong` which is now done using the `PRI*64` macros to ensure portability across 32 and 64-bit builds. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Replace OBSOLETE_FORAMT with OBSOLETE_FORMAT (#1776) * Replace OBSOLETE_FORAMT with OBSOLETE_FORMAT In imageHelpers.cpp and few other places in image tests, OBSOLETE_FORMAT is misspelled as OBSOLETE_FORAMT. Fix misspelling by replcaing it with OBSOLETE_FORMAT. Fixes #1769 * Remove code guarded by OBSOLETE_FORMAT Remove code guarded by OBSOLETE_FORMAT as suggested by review comments Fixes #1769 * Fix formating issues for OBSOLETE_FORMAT changes Fix formatting issues observed in files while removing code guarded by OBSOLETE_FORMAT Fixes #1769 * Some more formatting fixes Some more formatting fixes to get CI clean Fixes #1769 * Final Formating fixes Final formatting fixes for #1769 * Enhancement: Thread dimensions user parameters (#1384) * Fix format in the test scope * Add user params to limit testing Add parameters to reduce amount of testing. Helpful for debugging or for machines with lower performance. * Restore default value * Print info only if testing params bigger than 0. * [NFC] conversions: reenable Wunused-but-set-variable (#1845) Remove an assigned-to but unused variable. Reenable the Wunused-but-set-variable warning for the conversions suite, as it now compiles cleanly with this warning enabled. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Fix bug of conversion from long to double (#1847) * Fix bug of conversion from long to double It the input is long type, it should be load as long type, not ulong. * update long2float * math_brute_force: fix exp/exp2 rlx ULP calculation (#1848) Fix the ULP error calculation for the `exp` and `exp2` builtins in relaxed math mode for the full profile. Previously, the `ulps` value kept being added to while verifying the result buffer in a loop. `ulps` could even become a `NaN` when the input argument being tested was a `NaN`. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Enable LARGEADDRESSAWARE for 32 bit compilation (#1858) * Enable LARGEADDRESSAWARE for 32 bit compilation 32-bit executables built with MSVC linker have only 2GB virtual memory address space by default, which might not be sufficient for some tests. Enable LARGEADDRESSAWARE linker flag for 32-bit targets to allow tests to handle addresses larger than 2 gigabytes. https://learn.microsoft.com/en-us/cpp/build/reference/largeaddressaware-handle-large-addresses?view=msvc-170 Signed-off-by: Guo, Yilong <yilong.guo@intel.com> * Apply suggestion Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> --------- Signed-off-by: Guo, Yilong <yilong.guo@intel.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> * fix return code when readwrite image is not supported (#1873) This function (do_test) starts by testing write and read individually. Both of them can have errors. When readwrite image is not supported, the function returns TEST_SKIPPED_ITSELF potentially masking errors leading to the test returning EXIT_SUCCESS even with errors along the way. * fix macos builds by avoiding double compilation of function_list.cpp for test_spir (#1866) * modernize CMakeLists for test_spir * add the operating system release to the sccache key * include the math brute force function list vs. building it twice * fix the license header on the spirv-new tests (#1865) The source files for the spirv-new tests were using the older Khronos license instead of the proper Apache license. Fixed the license in all source files. * compiler: fix grammar in error message (#1877) Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Updated semaphore tests to use clSemaphoreReImportSyncFdKHR. (#1854) * Updated semaphore tests to use clSemaphoreReImportSyncFdKHR. Additionally updated common semaphore code to handle spec updates that restrict simultaneous importing/exporting of handles. * Fix build issues on CI * gcc build issues * Make clReImportSemaphoreSyncFdKHR a required API call if cl_khr_external_semaphore_sync_fd is present. * Implement signal and wait for all semaphore types. * subgroups: fix for testing too large WG sizes (#1620) It seemed to be a typo; the comment says that it tries to fetch local size for a subgroup count with above max WG size, but it just used the previous subgroup count. The test on purpose sets a SG count to be a larger number than the max work-items in the work group. Given the minimum SG size is 1 WI, it means that there can be a maximum of maximum work-group size of SGs (of 1 WI of size). Thus, if we request a number of SGs that exceeds the local size, the query should fail as expected. * add SPIR-V version testing (#1861) * basic SPIR-V 1.3 testing support * updated script to compile for more SPIR-V versions * switch to general SPIR-V versions test * update copyright text and fix license * improve output while test is running * check for higher SPIR-V versions first * fix formatting * fix the reported platform information for math brute force (#1884) When the math brute force test printed the platform version it always printed information for the first platform in the system, which could be different than the platform for the passed-in device. Fixed by querying the platform from the passed-in device instead. * api tests fix: Use MTdataHolder in test_get_image_info (#1871) * Minor fixes in mutable dispatch tests. (#1829) * Minor fixes in mutable dispatch tests. * Fix size of newWrapper in MutableDispatchSVMArguments. * Fix errnoneus clCommandNDRangeKernelKHR call. Signed-off-by: John Kesapides <john.kesapides@arm.com> * * Set the row_pitch for imageInfo in MutableDispatchImage1DArguments and MutableDispatchImage2DArguments. The row_pitch is used by get_image_size() to calculate the size of the host pointers by generate_random_image_data. Signed-off-by: John Kesapides <john.kesapides@arm.com> --------- Signed-off-by: John Kesapides <john.kesapides@arm.com> * add test for cl_khr_spirv_linkonce_odr (#1226) * initial version of the test with placeholders for linkonce_odr linkage * add OpExtension SPV_KHR_linkonce_odr extension * add check for extension * switch to actual LinkOnceODR linkage * fix formatting * add a test case to ensure a function with linkonce_odr is exported * add back the extension check * fix formatting * undo compiler optimization and actually add the call to function a * [NFC] subgroups: remove unnecessary extern keywords (#1892) In C and C++ all functions have external linkage by default. Also remove the unused `gMTdata` and `test_pipe_functions` declarations. Fixes https://github.com/KhronosGroup/OpenCL-CTS/issues/1137 Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Added cl_khr_fp16 extension support for test_decorate from spirv_new (#1770) * Added cl_khr_fp16 extension support for test_decorate from spirv_new, work in progres * Complemented test_decorate saturation test to support cl_khr_fp16 extension (issue #142) * Fixed clang format * scope of modifications: -changed naming convention of saturation .spvasm files related to test_decorate of spirv_new -restored float to char/uchar saturation tests -few minor corrections * fix ranges for half testing * fix formating * one more formatting fix * remove unused function * use isnan instead of std::isnan isnan is currently implemented as a macro, not as a function, so we can't use std::isnan. * fix Clang warning about inexact conversion --------- Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> * add support for custom devices (#1891) enable the CTS to run on custom devices --------- Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Guo, Yilong <yilong.guo@intel.com> Signed-off-by: John Kesapides <john.kesapides@arm.com> Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com> Co-authored-by: Haonan Yang <haonan.yang@intel.com> Co-authored-by: Ahmed Hesham <117350656+ahesham-arm@users.noreply.github.com> Co-authored-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Co-authored-by: niranjanjoshi121 <43807392+niranjanjoshi121@users.noreply.github.com> Co-authored-by: Grzegorz Wawiorko <grzegorz.wawiorko@intel.com> Co-authored-by: Wenwan Xing <wenwan.xing@intel.com> Co-authored-by: Yilong Guo <yilong.guo@intel.com> Co-authored-by: Romaric Jodin <89833130+rjodinchr@users.noreply.github.com> Co-authored-by: joshqti <127994991+joshqti@users.noreply.github.com> Co-authored-by: Pekka Jääskeläinen <pekka.jaaskelainen@tuni.fi> Co-authored-by: imilenkovic00 <155085410+imilenkovic00@users.noreply.github.com> Co-authored-by: John Kesapides <46718829+JohnKesapidesARM@users.noreply.github.com> Co-authored-by: Marcin Hajder <marcin.hajder@gmail.com> Co-authored-by: Aharon Abramson <aharon.abramson@mobileye.com>
619 lines
22 KiB
C++
619 lines
22 KiB
C++
//
|
|
// Copyright (c) 2017 The Khronos Group Inc.
|
|
//
|
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
|
// you may not use this file except in compliance with the License.
|
|
// You may obtain a copy of the License at
|
|
//
|
|
// http://www.apache.org/licenses/LICENSE-2.0
|
|
//
|
|
// Unless required by applicable law or agreed to in writing, software
|
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
// See the License for the specific language governing permissions and
|
|
// limitations under the License.
|
|
//
|
|
|
|
#include "common.h"
|
|
#include "function_list.h"
|
|
#include "test_functions.h"
|
|
#include "utility.h"
|
|
|
|
#include <cstring>
|
|
|
|
namespace {
|
|
|
|
cl_int BuildKernelFn(cl_uint job_id, cl_uint thread_id UNUSED, void *p)
|
|
{
|
|
BuildKernelInfo &info = *(BuildKernelInfo *)p;
|
|
auto generator = [](const std::string &kernel_name, const char *builtin,
|
|
cl_uint vector_size_index) {
|
|
return GetUnaryKernel(kernel_name, builtin, ParameterType::Float,
|
|
ParameterType::Float, vector_size_index);
|
|
};
|
|
return BuildKernels(info, job_id, generator);
|
|
}
|
|
|
|
// Thread specific data for a worker thread
|
|
struct ThreadInfo
|
|
{
|
|
// Input and output buffers for the thread
|
|
clMemWrapper inBuf;
|
|
Buffers outBuf;
|
|
|
|
float maxError; // max error value. Init to 0.
|
|
double maxErrorValue; // position of the max error value. Init to 0.
|
|
|
|
// Per thread command queue to improve performance
|
|
clCommandQueueWrapper tQueue;
|
|
};
|
|
|
|
struct TestInfo
|
|
{
|
|
size_t subBufferSize; // Size of the sub-buffer in elements
|
|
const Func *f; // A pointer to the function info
|
|
|
|
// Programs for various vector sizes.
|
|
Programs programs;
|
|
|
|
// Thread-specific kernels for each vector size:
|
|
// k[vector_size][thread_id]
|
|
KernelMatrix k;
|
|
|
|
// Array of thread specific information
|
|
std::vector<ThreadInfo> tinfo;
|
|
|
|
cl_uint threadCount; // Number of worker threads
|
|
cl_uint jobCount; // Number of jobs
|
|
cl_uint step; // step between each chunk and the next.
|
|
cl_uint scale; // stride between individual test values
|
|
float ulps; // max_allowed ulps
|
|
int ftz; // non-zero if running in flush to zero mode
|
|
|
|
int isRangeLimited; // 1 if the function is only to be evaluated over a
|
|
// range
|
|
float half_sin_cos_tan_limit;
|
|
bool relaxedMode; // True if test is running in relaxed mode, false
|
|
// otherwise.
|
|
};
|
|
|
|
cl_int Test(cl_uint job_id, cl_uint thread_id, void *data)
|
|
{
|
|
TestInfo *job = (TestInfo *)data;
|
|
size_t buffer_elements = job->subBufferSize;
|
|
size_t buffer_size = buffer_elements * sizeof(cl_float);
|
|
cl_uint scale = job->scale;
|
|
cl_uint base = job_id * (cl_uint)job->step;
|
|
ThreadInfo *tinfo = &(job->tinfo[thread_id]);
|
|
fptr func = job->f->func;
|
|
const char *fname = job->f->name;
|
|
bool relaxedMode = job->relaxedMode;
|
|
float ulps = getAllowedUlpError(job->f, relaxedMode);
|
|
if (relaxedMode)
|
|
{
|
|
func = job->f->rfunc;
|
|
}
|
|
|
|
cl_int error;
|
|
|
|
int isRangeLimited = job->isRangeLimited;
|
|
float half_sin_cos_tan_limit = job->half_sin_cos_tan_limit;
|
|
int ftz = job->ftz;
|
|
|
|
cl_event e[VECTOR_SIZE_COUNT];
|
|
cl_uint *out[VECTOR_SIZE_COUNT];
|
|
if (gHostFill)
|
|
{
|
|
// start the map of the output arrays
|
|
for (auto j = gMinVectorSizeIndex; j < gMaxVectorSizeIndex; j++)
|
|
{
|
|
out[j] = (cl_uint *)clEnqueueMapBuffer(
|
|
tinfo->tQueue, tinfo->outBuf[j], CL_FALSE, CL_MAP_WRITE, 0,
|
|
buffer_size, 0, NULL, e + j, &error);
|
|
if (error || NULL == out[j])
|
|
{
|
|
vlog_error("Error: clEnqueueMapBuffer %d failed! err: %d\n", j,
|
|
error);
|
|
return error;
|
|
}
|
|
}
|
|
|
|
// Get that moving
|
|
if ((error = clFlush(tinfo->tQueue))) vlog("clFlush failed\n");
|
|
}
|
|
|
|
// Write the new values to the input array
|
|
cl_uint *p = (cl_uint *)gIn + thread_id * buffer_elements;
|
|
for (size_t j = 0; j < buffer_elements; j++)
|
|
{
|
|
p[j] = base + j * scale;
|
|
if (relaxedMode)
|
|
{
|
|
float p_j = *(float *)&p[j];
|
|
if (strcmp(fname, "sin") == 0
|
|
|| strcmp(fname, "cos")
|
|
== 0) // the domain of the function is [-pi,pi]
|
|
{
|
|
if (fabs(p_j) > M_PI) ((float *)p)[j] = NAN;
|
|
}
|
|
|
|
if (strcmp(fname, "reciprocal") == 0)
|
|
{
|
|
const float l_limit = HEX_FLT(+, 1, 0, -, 126);
|
|
const float u_limit = HEX_FLT(+, 1, 0, +, 126);
|
|
|
|
if (fabs(p_j) < l_limit
|
|
|| fabs(p_j) > u_limit) // the domain of the function is
|
|
// [2^-126,2^126]
|
|
((float *)p)[j] = NAN;
|
|
}
|
|
}
|
|
}
|
|
|
|
if ((error = clEnqueueWriteBuffer(tinfo->tQueue, tinfo->inBuf, CL_FALSE, 0,
|
|
buffer_size, p, 0, NULL, NULL)))
|
|
{
|
|
vlog_error("Error: clEnqueueWriteBuffer failed! err: %d\n", error);
|
|
return error;
|
|
}
|
|
|
|
for (auto j = gMinVectorSizeIndex; j < gMaxVectorSizeIndex; j++)
|
|
{
|
|
if (gHostFill)
|
|
{
|
|
// Wait for the map to finish
|
|
if ((error = clWaitForEvents(1, e + j)))
|
|
{
|
|
vlog_error("Error: clWaitForEvents failed! err: %d\n", error);
|
|
return error;
|
|
}
|
|
if ((error = clReleaseEvent(e[j])))
|
|
{
|
|
vlog_error("Error: clReleaseEvent failed! err: %d\n", error);
|
|
return error;
|
|
}
|
|
}
|
|
|
|
// Fill the result buffer with garbage, so that old results don't carry
|
|
// over
|
|
uint32_t pattern = 0xffffdead;
|
|
if (gHostFill)
|
|
{
|
|
memset_pattern4(out[j], &pattern, buffer_size);
|
|
if ((error = clEnqueueUnmapMemObject(
|
|
tinfo->tQueue, tinfo->outBuf[j], out[j], 0, NULL, NULL)))
|
|
{
|
|
vlog_error("Error: clEnqueueUnmapMemObject failed! err: %d\n",
|
|
error);
|
|
return error;
|
|
}
|
|
}
|
|
else
|
|
{
|
|
if ((error = clEnqueueFillBuffer(tinfo->tQueue, tinfo->outBuf[j],
|
|
&pattern, sizeof(pattern), 0,
|
|
buffer_size, 0, NULL, NULL)))
|
|
{
|
|
vlog_error("Error: clEnqueueFillBuffer failed! err: %d\n",
|
|
error);
|
|
return error;
|
|
}
|
|
}
|
|
|
|
// Run the kernel
|
|
size_t vectorCount =
|
|
(buffer_elements + sizeValues[j] - 1) / sizeValues[j];
|
|
cl_kernel kernel = job->k[j][thread_id]; // each worker thread has its
|
|
// own copy of the cl_kernel
|
|
cl_program program = job->programs[j];
|
|
|
|
if ((error = clSetKernelArg(kernel, 0, sizeof(tinfo->outBuf[j]),
|
|
&tinfo->outBuf[j])))
|
|
{
|
|
LogBuildError(program);
|
|
return error;
|
|
}
|
|
if ((error = clSetKernelArg(kernel, 1, sizeof(tinfo->inBuf),
|
|
&tinfo->inBuf)))
|
|
{
|
|
LogBuildError(program);
|
|
return error;
|
|
}
|
|
|
|
if ((error = clEnqueueNDRangeKernel(tinfo->tQueue, kernel, 1, NULL,
|
|
&vectorCount, NULL, 0, NULL, NULL)))
|
|
{
|
|
vlog_error("FAILED -- could not execute kernel\n");
|
|
return error;
|
|
}
|
|
}
|
|
|
|
// Get that moving
|
|
if ((error = clFlush(tinfo->tQueue))) vlog("clFlush 2 failed\n");
|
|
|
|
if (gSkipCorrectnessTesting) return CL_SUCCESS;
|
|
|
|
// Calculate the correctly rounded reference result
|
|
float *r = (float *)gOut_Ref + thread_id * buffer_elements;
|
|
float *s = (float *)p;
|
|
for (size_t j = 0; j < buffer_elements; j++) r[j] = (float)func.f_f(s[j]);
|
|
|
|
// Read the data back -- no need to wait for the first N-1 buffers but wait
|
|
// for the last buffer. This is an in order queue.
|
|
for (auto j = gMinVectorSizeIndex; j < gMaxVectorSizeIndex; j++)
|
|
{
|
|
cl_bool blocking = (j + 1 < gMaxVectorSizeIndex) ? CL_FALSE : CL_TRUE;
|
|
out[j] = (cl_uint *)clEnqueueMapBuffer(
|
|
tinfo->tQueue, tinfo->outBuf[j], blocking, CL_MAP_READ, 0,
|
|
buffer_size, 0, NULL, NULL, &error);
|
|
if (error || NULL == out[j])
|
|
{
|
|
vlog_error("Error: clEnqueueMapBuffer %d failed! err: %d\n", j,
|
|
error);
|
|
return error;
|
|
}
|
|
}
|
|
|
|
// Verify data
|
|
uint32_t *t = (uint32_t *)r;
|
|
for (size_t j = 0; j < buffer_elements; j++)
|
|
{
|
|
for (auto k = gMinVectorSizeIndex; k < gMaxVectorSizeIndex; k++)
|
|
{
|
|
uint32_t *q = out[k];
|
|
|
|
// If we aren't getting the correctly rounded result
|
|
if (t[j] != q[j])
|
|
{
|
|
float test = ((float *)q)[j];
|
|
double correct = func.f_f(s[j]);
|
|
float err = Ulp_Error(test, correct);
|
|
float abs_error = Abs_Error(test, correct);
|
|
int fail = 0;
|
|
int use_abs_error = 0;
|
|
|
|
// it is possible for the output to not match the reference
|
|
// result but for Ulp_Error to be zero, for example -1.#QNAN
|
|
// vs. 1.#QNAN. In such cases there is no failure
|
|
if (err == 0.0f)
|
|
{
|
|
fail = 0;
|
|
}
|
|
else if (relaxedMode)
|
|
{
|
|
if (strcmp(fname, "sin") == 0 || strcmp(fname, "cos") == 0)
|
|
{
|
|
fail = !(fabsf(abs_error) <= ulps);
|
|
use_abs_error = 1;
|
|
}
|
|
if (strcmp(fname, "sinpi") == 0
|
|
|| strcmp(fname, "cospi") == 0)
|
|
{
|
|
if (s[j] >= -1.0 && s[j] <= 1.0)
|
|
{
|
|
fail = !(fabsf(abs_error) <= ulps);
|
|
use_abs_error = 1;
|
|
}
|
|
}
|
|
|
|
if (strcmp(fname, "reciprocal") == 0)
|
|
{
|
|
fail = !(fabsf(err) <= ulps);
|
|
}
|
|
|
|
if (strcmp(fname, "exp") == 0 || strcmp(fname, "exp2") == 0)
|
|
{
|
|
// For full profile, ULP depends on input value.
|
|
// For embedded profile, ULP comes from functionList.
|
|
if (!gIsEmbedded)
|
|
{
|
|
ulps = 3.0f + floor(fabs(2 * s[j]));
|
|
}
|
|
|
|
fail = !(fabsf(err) <= ulps);
|
|
}
|
|
if (strcmp(fname, "tan") == 0)
|
|
{
|
|
|
|
if (!gFastRelaxedDerived)
|
|
{
|
|
fail = !(fabsf(err) <= ulps);
|
|
}
|
|
// Else fast math derived implementation does not
|
|
// require ULP verification
|
|
}
|
|
if (strcmp(fname, "exp10") == 0)
|
|
{
|
|
if (!gFastRelaxedDerived)
|
|
{
|
|
fail = !(fabsf(err) <= ulps);
|
|
}
|
|
// Else fast math derived implementation does not
|
|
// require ULP verification
|
|
}
|
|
if (strcmp(fname, "log") == 0 || strcmp(fname, "log2") == 0
|
|
|| strcmp(fname, "log10") == 0)
|
|
{
|
|
if (s[j] >= 0.5 && s[j] <= 2)
|
|
{
|
|
fail = !(fabsf(abs_error) <= ulps);
|
|
}
|
|
else
|
|
{
|
|
ulps = gIsEmbedded ? job->f->float_embedded_ulps
|
|
: job->f->float_ulps;
|
|
fail = !(fabsf(err) <= ulps);
|
|
}
|
|
}
|
|
|
|
|
|
// fast-relaxed implies finite-only
|
|
if (IsFloatInfinity(correct) || IsFloatNaN(correct)
|
|
|| IsFloatInfinity(s[j]) || IsFloatNaN(s[j]))
|
|
{
|
|
fail = 0;
|
|
err = 0;
|
|
}
|
|
}
|
|
else
|
|
{
|
|
fail = !(fabsf(err) <= ulps);
|
|
}
|
|
|
|
// half_sin/cos/tan are only valid between +-2**16, Inf, NaN
|
|
if (isRangeLimited
|
|
&& fabsf(s[j]) > MAKE_HEX_FLOAT(0x1.0p16f, 0x1L, 16)
|
|
&& fabsf(s[j]) < INFINITY)
|
|
{
|
|
if (fabsf(test) <= half_sin_cos_tan_limit)
|
|
{
|
|
err = 0;
|
|
fail = 0;
|
|
}
|
|
}
|
|
|
|
if (fail)
|
|
{
|
|
if (ftz || relaxedMode)
|
|
{
|
|
typedef int (*CheckForSubnormal)(
|
|
double, float); // If we are in fast relaxed math,
|
|
// we have a different calculation
|
|
// for the subnormal threshold.
|
|
CheckForSubnormal isFloatResultSubnormalPtr;
|
|
|
|
if (relaxedMode)
|
|
{
|
|
isFloatResultSubnormalPtr =
|
|
&IsFloatResultSubnormalAbsError;
|
|
}
|
|
else
|
|
{
|
|
isFloatResultSubnormalPtr = &IsFloatResultSubnormal;
|
|
}
|
|
// retry per section 6.5.3.2
|
|
if ((*isFloatResultSubnormalPtr)(correct, ulps))
|
|
{
|
|
fail = fail && (test != 0.0f);
|
|
if (!fail) err = 0.0f;
|
|
}
|
|
|
|
// retry per section 6.5.3.3
|
|
if (IsFloatSubnormal(s[j]))
|
|
{
|
|
double correct2 = func.f_f(0.0);
|
|
double correct3 = func.f_f(-0.0);
|
|
float err2;
|
|
float err3;
|
|
if (use_abs_error)
|
|
{
|
|
err2 = Abs_Error(test, correct2);
|
|
err3 = Abs_Error(test, correct3);
|
|
}
|
|
else
|
|
{
|
|
err2 = Ulp_Error(test, correct2);
|
|
err3 = Ulp_Error(test, correct3);
|
|
}
|
|
fail = fail
|
|
&& ((!(fabsf(err2) <= ulps))
|
|
&& (!(fabsf(err3) <= ulps)));
|
|
if (fabsf(err2) < fabsf(err)) err = err2;
|
|
if (fabsf(err3) < fabsf(err)) err = err3;
|
|
|
|
// retry per section 6.5.3.4
|
|
if ((*isFloatResultSubnormalPtr)(correct2, ulps)
|
|
|| (*isFloatResultSubnormalPtr)(correct3, ulps))
|
|
{
|
|
fail = fail && (test != 0.0f);
|
|
if (!fail) err = 0.0f;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
if (fabsf(err) > tinfo->maxError)
|
|
{
|
|
tinfo->maxError = fabsf(err);
|
|
tinfo->maxErrorValue = s[j];
|
|
}
|
|
if (fail)
|
|
{
|
|
vlog_error("\nERROR: %s%s: %f ulp error at %a (0x%8.8x): "
|
|
"*%a vs. %a\n",
|
|
job->f->name, sizeNames[k], err, ((float *)s)[j],
|
|
((uint32_t *)s)[j], ((float *)t)[j], test);
|
|
return -1;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
for (auto j = gMinVectorSizeIndex; j < gMaxVectorSizeIndex; j++)
|
|
{
|
|
if ((error = clEnqueueUnmapMemObject(tinfo->tQueue, tinfo->outBuf[j],
|
|
out[j], 0, NULL, NULL)))
|
|
{
|
|
vlog_error("Error: clEnqueueUnmapMemObject %d failed 2! err: %d\n",
|
|
j, error);
|
|
return error;
|
|
}
|
|
}
|
|
|
|
if ((error = clFlush(tinfo->tQueue))) vlog("clFlush 3 failed\n");
|
|
|
|
|
|
if (0 == (base & 0x0fffffff))
|
|
{
|
|
if (gVerboseBruteForce)
|
|
{
|
|
vlog("base:%14u step:%10u scale:%10u buf_elements:%10zd ulps:%5.3f "
|
|
"ThreadCount:%2u\n",
|
|
base, job->step, job->scale, buffer_elements, job->ulps,
|
|
job->threadCount);
|
|
}
|
|
else
|
|
{
|
|
vlog(".");
|
|
}
|
|
fflush(stdout);
|
|
}
|
|
|
|
return CL_SUCCESS;
|
|
}
|
|
|
|
} // anonymous namespace
|
|
|
|
int TestFunc_Float_Float(const Func *f, MTdata d, bool relaxedMode)
|
|
{
|
|
TestInfo test_info{};
|
|
cl_int error;
|
|
float maxError = 0.0f;
|
|
double maxErrorVal = 0.0;
|
|
int skipTestingRelaxed = (relaxedMode && strcmp(f->name, "tan") == 0);
|
|
|
|
logFunctionInfo(f->name, sizeof(cl_float), relaxedMode);
|
|
|
|
// Init test_info
|
|
test_info.threadCount = GetThreadCount();
|
|
test_info.subBufferSize = BUFFER_SIZE
|
|
/ (sizeof(cl_float) * RoundUpToNextPowerOfTwo(test_info.threadCount));
|
|
test_info.scale = getTestScale(sizeof(cl_float));
|
|
|
|
test_info.step = (cl_uint)test_info.subBufferSize * test_info.scale;
|
|
if (test_info.step / test_info.subBufferSize != test_info.scale)
|
|
{
|
|
// there was overflow
|
|
test_info.jobCount = 1;
|
|
}
|
|
else
|
|
{
|
|
test_info.jobCount = (cl_uint)((1ULL << 32) / test_info.step);
|
|
}
|
|
|
|
test_info.f = f;
|
|
test_info.ulps = gIsEmbedded ? f->float_embedded_ulps : f->float_ulps;
|
|
test_info.ftz =
|
|
f->ftz || gForceFTZ || 0 == (CL_FP_DENORM & gFloatCapabilities);
|
|
test_info.relaxedMode = relaxedMode;
|
|
test_info.tinfo.resize(test_info.threadCount);
|
|
for (cl_uint i = 0; i < test_info.threadCount; i++)
|
|
{
|
|
cl_buffer_region region = {
|
|
i * test_info.subBufferSize * sizeof(cl_float),
|
|
test_info.subBufferSize * sizeof(cl_float)
|
|
};
|
|
test_info.tinfo[i].inBuf =
|
|
clCreateSubBuffer(gInBuffer, CL_MEM_READ_ONLY,
|
|
CL_BUFFER_CREATE_TYPE_REGION, ®ion, &error);
|
|
if (error || NULL == test_info.tinfo[i].inBuf)
|
|
{
|
|
vlog_error("Error: Unable to create sub-buffer of gInBuffer for "
|
|
"region {%zd, %zd}\n",
|
|
region.origin, region.size);
|
|
return error;
|
|
}
|
|
|
|
for (auto j = gMinVectorSizeIndex; j < gMaxVectorSizeIndex; j++)
|
|
{
|
|
test_info.tinfo[i].outBuf[j] = clCreateSubBuffer(
|
|
gOutBuffer[j], CL_MEM_WRITE_ONLY, CL_BUFFER_CREATE_TYPE_REGION,
|
|
®ion, &error);
|
|
if (error || NULL == test_info.tinfo[i].outBuf[j])
|
|
{
|
|
vlog_error("Error: Unable to create sub-buffer of "
|
|
"gOutBuffer[%d] for region {%zd, %zd}\n",
|
|
(int)j, region.origin, region.size);
|
|
return error;
|
|
}
|
|
}
|
|
test_info.tinfo[i].tQueue =
|
|
clCreateCommandQueue(gContext, gDevice, 0, &error);
|
|
if (NULL == test_info.tinfo[i].tQueue || error)
|
|
{
|
|
vlog_error("clCreateCommandQueue failed. (%d)\n", error);
|
|
return error;
|
|
}
|
|
}
|
|
|
|
// Check for special cases for unary float
|
|
test_info.isRangeLimited = 0;
|
|
test_info.half_sin_cos_tan_limit = 0;
|
|
if (0 == strcmp(f->name, "half_sin") || 0 == strcmp(f->name, "half_cos"))
|
|
{
|
|
test_info.isRangeLimited = 1;
|
|
test_info.half_sin_cos_tan_limit = 1.0f
|
|
+ test_info.ulps
|
|
* (FLT_EPSILON / 2.0f); // out of range results from finite
|
|
// inputs must be in [-1,1]
|
|
}
|
|
else if (0 == strcmp(f->name, "half_tan"))
|
|
{
|
|
test_info.isRangeLimited = 1;
|
|
test_info.half_sin_cos_tan_limit =
|
|
INFINITY; // out of range resut from finite inputs must be numeric
|
|
}
|
|
|
|
// Init the kernels
|
|
BuildKernelInfo build_info{ test_info.threadCount, test_info.k,
|
|
test_info.programs, f->nameInCode,
|
|
relaxedMode };
|
|
if ((error = ThreadPool_Do(BuildKernelFn,
|
|
gMaxVectorSizeIndex - gMinVectorSizeIndex,
|
|
&build_info)))
|
|
return error;
|
|
|
|
// Run the kernels
|
|
if (!gSkipCorrectnessTesting || skipTestingRelaxed)
|
|
{
|
|
error = ThreadPool_Do(Test, test_info.jobCount, &test_info);
|
|
if (error) return error;
|
|
|
|
// Accumulate the arithmetic errors
|
|
for (cl_uint i = 0; i < test_info.threadCount; i++)
|
|
{
|
|
if (test_info.tinfo[i].maxError > maxError)
|
|
{
|
|
maxError = test_info.tinfo[i].maxError;
|
|
maxErrorVal = test_info.tinfo[i].maxErrorValue;
|
|
}
|
|
}
|
|
|
|
if (gWimpyMode)
|
|
vlog("Wimp pass");
|
|
else
|
|
vlog("passed");
|
|
|
|
if (skipTestingRelaxed)
|
|
{
|
|
vlog(" (rlx skip correctness testing)\n");
|
|
return error;
|
|
}
|
|
|
|
vlog("\t%8.2f @ %a", maxError, maxErrorVal);
|
|
}
|
|
|
|
vlog("\n");
|
|
|
|
return CL_SUCCESS;
|
|
}
|