Files
OpenCL-CTS/test_conformance/conversions/conversions_data_info.h
Ben Ashbaugh 620c689919 update fp16 staging branch from main (#1903)
* allocations: Move results array from stack to heap (#1857)

* allocations: Fix stack overflow

* check format fixes

* Fix windows stack overflow. (#1839)

* thread_dimensions: Avoid combinations of very small LWS and very large GWS (#1856)

Modify the existing condition to include extremely small LWS like
1x1 on large GWS values

* c11_atomics: Reduce the loopcounter for sequential consistency tests (#1853)

Reduce the loop from 1000000 to 500000 since the former value
makes the test run too long and cause system issues on certain
platforms

* Limit individual allocation size using the global memory size (#1835)

Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>

* geometrics: fix Wsign-compare warnings (#1855)

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* integer_ops: fix -Wformat warnings (#1860)

The main sources of warnings were:

 * Printing of a `size_t` which requires the `%zu` specifier.

 * Printing of `cl_long`/`cl_ulong` which is now done using the
   `PRI*64` macros to ensure portability across 32 and 64-bit builds.

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Replace OBSOLETE_FORAMT with OBSOLETE_FORMAT (#1776)

* Replace OBSOLETE_FORAMT with OBSOLETE_FORMAT

In imageHelpers.cpp and few other places in image tests, OBSOLETE_FORMAT is misspelled as OBSOLETE_FORAMT.
Fix misspelling by replcaing it with OBSOLETE_FORMAT.

Fixes #1769

* Remove code guarded by OBSOLETE_FORMAT

Remove code guarded by OBSOLETE_FORMAT
as suggested by review comments

Fixes #1769

* Fix formating issues for OBSOLETE_FORMAT changes

Fix formatting issues observed in files while removing
code guarded by OBSOLETE_FORMAT

Fixes #1769

* Some more formatting fixes

Some more formatting fixes to get CI clean

Fixes #1769

* Final Formating fixes

Final formatting fixes for #1769

* Enhancement: Thread dimensions user parameters (#1384)

* Fix format in the test scope

* Add user params to limit testing

Add parameters to reduce amount of testing.
Helpful for debugging or for machines with lower performance.

* Restore default value

* Print info only if testing params bigger than 0.

* [NFC] conversions: reenable Wunused-but-set-variable (#1845)

Remove an assigned-to but unused variable.

Reenable the Wunused-but-set-variable warning for the conversions
suite, as it now compiles cleanly with this warning enabled.

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Fix bug of conversion from long to double (#1847)

* Fix bug of conversion from long to double

It the input is long type, it should be load as long type, not ulong.

* update long2float

* math_brute_force: fix exp/exp2 rlx ULP calculation (#1848)

Fix the ULP error calculation for the `exp` and `exp2` builtins in
relaxed math mode for the full profile.

Previously, the `ulps` value kept being added to while verifying the
result buffer in a loop.  `ulps` could even become a `NaN` when the
input argument being tested was a `NaN`.

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Enable LARGEADDRESSAWARE for 32 bit compilation (#1858)

* Enable LARGEADDRESSAWARE for 32 bit compilation

32-bit executables built with MSVC linker have only 2GB virtual memory
address space by default, which might not be sufficient for some tests.

Enable LARGEADDRESSAWARE linker flag for 32-bit targets to allow tests
to handle addresses larger than 2 gigabytes.

https://learn.microsoft.com/en-us/cpp/build/reference/largeaddressaware-handle-large-addresses?view=msvc-170

Signed-off-by: Guo, Yilong <yilong.guo@intel.com>

* Apply suggestion

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

---------

Signed-off-by: Guo, Yilong <yilong.guo@intel.com>
Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* fix return code when readwrite image is not supported (#1873)

This function (do_test) starts by testing write and read individually.
Both of them can have errors.

When readwrite image is not supported, the function returns
TEST_SKIPPED_ITSELF potentially masking errors leading to the test
returning EXIT_SUCCESS even with errors along the way.

* fix macos builds by avoiding double compilation of function_list.cpp for test_spir (#1866)

* modernize CMakeLists for test_spir

* add the operating system release to the sccache key

* include the math brute force function list vs. building it twice

* fix the license header on the spirv-new tests (#1865)

The source files for the spirv-new tests were using the older Khronos
license instead of the proper Apache license.  Fixed the license in
all source files.

* compiler: fix grammar in error message (#1877)

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Updated semaphore tests to use clSemaphoreReImportSyncFdKHR. (#1854)

* Updated semaphore tests to use clSemaphoreReImportSyncFdKHR.

Additionally updated common semaphore code to handle spec updates
that restrict simultaneous importing/exporting of handles.

* Fix build issues on CI

* gcc build issues

* Make clReImportSemaphoreSyncFdKHR a required API
call if cl_khr_external_semaphore_sync_fd is present.

* Implement signal and wait for all semaphore types.

* subgroups: fix for testing too large WG sizes (#1620)

It seemed to be a typo; the comment says that it
tries to fetch local size for a subgroup count with
above max WG size, but it just used the previous
subgroup count.

The test on purpose sets a SG count to be a larger
number than the max work-items in the work group.
Given the minimum SG size is 1 WI, it means that there
can be a maximum of maximum work-group size of SGs (of
1 WI of size). Thus, if we request a number of SGs that
exceeds the local size, the query should fail as expected.

* add SPIR-V version testing (#1861)

* basic SPIR-V 1.3 testing support

* updated script to compile for more SPIR-V versions

* switch to general SPIR-V versions test

* update copyright text and fix license

* improve output while test is running

* check for higher SPIR-V versions first

* fix formatting

* fix the reported platform information for math brute force (#1884)

When the math brute force test printed the platform version it always
printed information for the first platform in the system, which could
be different than the platform for the passed-in device.  Fixed by
querying the platform from the passed-in device instead.

* api tests fix: Use MTdataHolder in test_get_image_info (#1871)

* Minor fixes in mutable dispatch tests. (#1829)

* Minor fixes in mutable dispatch tests.

* Fix size of newWrapper in MutableDispatchSVMArguments.
* Fix errnoneus clCommandNDRangeKernelKHR call.

Signed-off-by: John Kesapides <john.kesapides@arm.com>

* * Set the row_pitch for imageInfo in MutableDispatchImage1DArguments
and MutableDispatchImage2DArguments. The row_pitch is
used by get_image_size() to calculate the size of
the host pointers by generate_random_image_data.

Signed-off-by: John Kesapides <john.kesapides@arm.com>

---------

Signed-off-by: John Kesapides <john.kesapides@arm.com>

* add test for cl_khr_spirv_linkonce_odr (#1226)

* initial version of the test with placeholders for linkonce_odr linkage

* add OpExtension SPV_KHR_linkonce_odr extension

* add check for extension

* switch to actual LinkOnceODR linkage

* fix formatting

* add a test case to ensure a function with linkonce_odr is exported

* add back the extension check

* fix formatting

* undo compiler optimization and actually add the call to function a

* [NFC] subgroups: remove unnecessary extern keywords (#1892)

In C and C++ all functions have external linkage by default.

Also remove the unused `gMTdata` and `test_pipe_functions`
declarations.

Fixes https://github.com/KhronosGroup/OpenCL-CTS/issues/1137

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Added cl_khr_fp16 extension support for test_decorate from spirv_new (#1770)

* Added cl_khr_fp16 extension support for test_decorate from spirv_new, work in progres

* Complemented test_decorate saturation test to support cl_khr_fp16 extension (issue #142)

* Fixed clang format

* scope of modifications:

-changed naming convention of saturation .spvasm files related to
test_decorate of spirv_new
-restored float to char/uchar saturation tests
-few minor corrections

* fix ranges for half testing

* fix formating

* one more formatting fix

* remove unused function

* use isnan instead of std::isnan

isnan is currently implemented as a macro, not as a function, so
we can't use std::isnan.

* fix Clang warning about inexact conversion

---------

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* add support for custom devices (#1891)

enable the CTS to run on custom devices

---------

Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Signed-off-by: Guo, Yilong <yilong.guo@intel.com>
Signed-off-by: John Kesapides <john.kesapides@arm.com>
Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com>
Co-authored-by: Haonan Yang <haonan.yang@intel.com>
Co-authored-by: Ahmed Hesham <117350656+ahesham-arm@users.noreply.github.com>
Co-authored-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Co-authored-by: niranjanjoshi121 <43807392+niranjanjoshi121@users.noreply.github.com>
Co-authored-by: Grzegorz Wawiorko <grzegorz.wawiorko@intel.com>
Co-authored-by: Wenwan Xing <wenwan.xing@intel.com>
Co-authored-by: Yilong Guo <yilong.guo@intel.com>
Co-authored-by: Romaric Jodin <89833130+rjodinchr@users.noreply.github.com>
Co-authored-by: joshqti <127994991+joshqti@users.noreply.github.com>
Co-authored-by: Pekka Jääskeläinen <pekka.jaaskelainen@tuni.fi>
Co-authored-by: imilenkovic00 <155085410+imilenkovic00@users.noreply.github.com>
Co-authored-by: John Kesapides <46718829+JohnKesapidesARM@users.noreply.github.com>
Co-authored-by: Marcin Hajder <marcin.hajder@gmail.com>
Co-authored-by: Aharon Abramson <aharon.abramson@mobileye.com>
2024-03-02 16:48:45 -08:00

933 lines
35 KiB
C++

//
// Copyright (c) 2023 The Khronos Group Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
#ifndef CONVERSIONS_DATA_INFO_H
#define CONVERSIONS_DATA_INFO_H
#if defined(__APPLE__)
#include <OpenCL/opencl.h>
#else
#include <CL/opencl.h>
#endif
#if (defined(__arm__) || defined(__aarch64__)) && defined(__GNUC__)
#include "fplib.h"
extern bool qcom_sat;
extern roundingMode qcom_rm;
#endif
#include <CL/cl_half.h>
#include "harness/mt19937.h"
#include "harness/rounding_mode.h"
#include "harness/typeWrappers.h"
#include <vector>
#if defined(__linux__)
#include <sys/param.h>
#include <libgen.h>
#endif
extern size_t gTypeSizes[kTypeCount];
extern void *gIn;
typedef enum
{
kUnsaturated = 0,
kSaturated,
kSaturationModeCount
} SaturationMode;
struct DataInitInfo
{
cl_ulong start;
cl_uint size;
Type outType;
Type inType;
SaturationMode sat;
RoundingMode round;
cl_uint threads;
static cl_half_rounding_mode halfRoundingMode;
static std::vector<uint32_t> specialValuesUInt;
static std::vector<float> specialValuesFloat;
static std::vector<double> specialValuesDouble;
static std::vector<cl_half> specialValuesHalf;
};
#define HFF(num) cl_half_from_float(num, DataInitInfo::halfRoundingMode)
#define HTF(num) cl_half_to_float(num)
#define HFD(num) cl_half_from_double(num, DataInitInfo::halfRoundingMode)
struct DataInitBase : public DataInitInfo
{
virtual ~DataInitBase() = default;
explicit DataInitBase(const DataInitInfo &agg): DataInitInfo(agg) {}
virtual void conv_array(void *out, void *in, size_t n) {}
virtual void conv_array_sat(void *out, void *in, size_t n) {}
virtual void init(const cl_uint &, const cl_uint &) {}
};
template <typename InType, typename OutType, bool InFP, bool OutFP>
struct DataInfoSpec : public DataInitBase
{
explicit DataInfoSpec(const DataInitInfo &agg);
// helpers
float round_to_int(float f);
long long round_to_int_and_clamp(double d);
OutType absolute(const OutType &x);
// actual conversion of reference values
void conv(OutType *out, InType *in);
void conv_sat(OutType *out, InType *in);
// min/max ranges for output type of data
std::pair<OutType, OutType> ranges;
// matrix of clamping ranges for each rounding type
std::vector<std::pair<InType, InType>> clamp_ranges;
std::vector<MTdataHolder> mdv;
constexpr bool is_in_half() const
{
return (std::is_same<InType, cl_half>::value && InFP);
}
constexpr bool is_out_half() const
{
return (std::is_same<OutType, cl_half>::value && OutFP);
}
void conv_array(void *out, void *in, size_t n) override
{
for (size_t i = 0; i < n; i++)
conv(&((OutType *)out)[i], &((InType *)in)[i]);
}
void conv_array_sat(void *out, void *in, size_t n) override
{
for (size_t i = 0; i < n; i++)
conv_sat(&((OutType *)out)[i], &((InType *)in)[i]);
}
void init(const cl_uint &, const cl_uint &) override;
InType clamp(const InType &);
inline float fclamp(float lo, float v, float hi)
{
v = v < lo ? lo : v;
return v < hi ? v : hi;
}
inline double dclamp(double lo, double v, double hi)
{
v = v < lo ? lo : v;
return v < hi ? v : hi;
}
};
template <typename InType, typename OutType, bool InFP, bool OutFP>
DataInfoSpec<InType, OutType, InFP, OutFP>::DataInfoSpec(
const DataInitInfo &agg)
: DataInitBase(agg), mdv(0)
{
if (std::is_same<cl_float, OutType>::value)
ranges = std::make_pair(CL_FLT_MIN, CL_FLT_MAX);
else if (std::is_same<cl_double, OutType>::value)
ranges = std::make_pair(CL_DBL_MIN, CL_DBL_MAX);
else if (std::is_same<cl_half, OutType>::value && OutFP)
ranges = std::make_pair(HFF(CL_HALF_MIN), HFF(CL_HALF_MAX));
else if (std::is_same<cl_uchar, OutType>::value)
ranges = std::make_pair(0, CL_UCHAR_MAX);
else if (std::is_same<cl_char, OutType>::value)
ranges = std::make_pair(CL_CHAR_MIN, CL_CHAR_MAX);
else if (std::is_same<cl_ushort, OutType>::value && !OutFP)
ranges = std::make_pair(0, CL_USHRT_MAX);
else if (std::is_same<cl_short, OutType>::value)
ranges = std::make_pair(CL_SHRT_MIN, CL_SHRT_MAX);
else if (std::is_same<cl_uint, OutType>::value)
ranges = std::make_pair(0, CL_UINT_MAX);
else if (std::is_same<cl_int, OutType>::value)
ranges = std::make_pair(CL_INT_MIN, CL_INT_MAX);
else if (std::is_same<cl_ulong, OutType>::value)
ranges = std::make_pair(0, CL_ULONG_MAX);
else if (std::is_same<cl_long, OutType>::value)
ranges = std::make_pair(CL_LONG_MIN, CL_LONG_MAX);
// clang-format off
// for readability sake keep this section unformatted
if (std::is_floating_point<InType>::value)
{ // from float/double
InType outMin = static_cast<InType>(ranges.first);
InType outMax = static_cast<InType>(ranges.second);
InType eps = std::is_same<InType, cl_float>::value ? (InType) FLT_EPSILON : (InType) DBL_EPSILON;
if (std::is_integral<OutType>::value && !OutFP)
{ // to char/uchar/short/ushort/int/uint/long/ulong
if (sizeof(OutType)<=sizeof(cl_short))
{ // to char/uchar/short/ushort
clamp_ranges=
{{outMin-0.5f, outMax + 0.5f - outMax * 0.5f * eps},
{outMin-0.5f, outMax + 0.5f - outMax * 0.5f * eps},
{outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps, outMax-1.f},
{outMin-0.0f, outMax - outMax * 0.5f * eps },
{outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps, outMax - outMax * 0.5f * eps}};
}
else if (std::is_same<InType, cl_float>::value)
{ // from float
if (std::is_same<OutType, cl_uint>::value)
{ // to uint
clamp_ranges=
{ {outMin-0.5f, MAKE_HEX_FLOAT(0x1.fffffep31f, 0x1fffffeL, 7)},
{outMin-0.5f, MAKE_HEX_FLOAT(0x1.fffffep31f, 0x1fffffeL, 7)},
{outMin-1.0f+0.5f*eps, MAKE_HEX_FLOAT(0x1.fffffep31f, 0x1fffffeL, 7)},
{outMin-0.0f, MAKE_HEX_FLOAT(0x1.fffffep31f, 0x1fffffeL, 7) },
{outMin-1.0f+0.5f*eps, MAKE_HEX_FLOAT(0x1.fffffep31f, 0x1fffffeL, 7)}};
}
else if (std::is_same<OutType, cl_int>::value)
{ // to int
clamp_ranges=
{ {outMin, MAKE_HEX_FLOAT(0x1.fffffep30f, 0x1fffffeL, 6)},
{outMin, MAKE_HEX_FLOAT(0x1.fffffep30f, 0x1fffffeL, 6)},
{outMin, MAKE_HEX_FLOAT(0x1.fffffep30f, 0x1fffffeL, 6)},
{outMin, MAKE_HEX_FLOAT(0x1.fffffep30f, 0x1fffffeL, 6) },
{outMin, MAKE_HEX_FLOAT(0x1.fffffep30f, 0x1fffffeL, 6)}};
}
else if (std::is_same<OutType, cl_ulong>::value)
{ // to ulong
clamp_ranges=
{{outMin-0.5f, MAKE_HEX_FLOAT(0x1.fffffep63f, 0x1fffffeL, 39)},
{outMin-0.5f, MAKE_HEX_FLOAT(0x1.fffffep63f, 0x1fffffeL, 39)},
{outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps, MAKE_HEX_FLOAT(0x1.fffffep63f, 0x1fffffeL, 39)},
{outMin-0.0f, MAKE_HEX_FLOAT(0x1.fffffep63f, 0x1fffffeL, 39) },
{outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps, MAKE_HEX_FLOAT(0x1.fffffep63f, 0x1fffffeL, 39)}};
}
else if (std::is_same<OutType, cl_long>::value)
{ // to long
clamp_ranges=
{ {MAKE_HEX_FLOAT(-0x1.0p63f, -0x1L, 63), MAKE_HEX_FLOAT(0x1.fffffep62f, 0x1fffffeL, 38)},
{MAKE_HEX_FLOAT(-0x1.0p63f, -0x1L, 63), MAKE_HEX_FLOAT(0x1.fffffep62f, 0x1fffffeL, 38)},
{MAKE_HEX_FLOAT(-0x1.0p63f, -0x1L, 63), MAKE_HEX_FLOAT(0x1.fffffep62f, 0x1fffffeL, 38)},
{MAKE_HEX_FLOAT(-0x1.0p63f, -0x1L, 63), MAKE_HEX_FLOAT(0x1.fffffep62f, 0x1fffffeL, 38)},
{MAKE_HEX_FLOAT(-0x1.0p63f, -0x1L, 63), MAKE_HEX_FLOAT(0x1.fffffep62f, 0x1fffffeL, 38)}};
}
}
else
{ // from double
if (std::is_same<OutType, cl_uint>::value)
{ // to uint
clamp_ranges=
{ {outMin-0.5f, outMax + 0.5 - MAKE_HEX_DOUBLE(0x1.0p31, 0x1LL, 31) * eps},
{outMin-0.5f, outMax + 0.5 - MAKE_HEX_DOUBLE(0x1.0p31, 0x1LL, 31) * eps},
{outMin-1.0f+0.5f*eps, outMax},
{outMin-0.0f, MAKE_HEX_DOUBLE(0x1.fffffffffffffp31, 0x1fffffffffffffLL, -21) },
{outMin-1.0f+0.5f*eps, MAKE_HEX_DOUBLE(0x1.fffffffffffffp31, 0x1fffffffffffffLL, -21)}};
}
else if (std::is_same<OutType, cl_int>::value)
{ // to int
clamp_ranges=
{ {outMin-0.5f, outMax + 0.5 - MAKE_HEX_DOUBLE(0x1.0p30, 0x1LL, 30) * eps},
{outMin-0.5f, outMax + 0.5 - MAKE_HEX_DOUBLE(0x1.0p30, 0x1LL, 30) * eps},
{outMin-1.0f+outMax*eps, outMax},
{outMin-0.0f, outMax + 1.0 - MAKE_HEX_DOUBLE(0x1.0p30, 0x1LL, 30) * eps },
{outMin-1.0f+outMax*eps, outMax + 1.0 - MAKE_HEX_DOUBLE(0x1.0p30, 0x1LL, 30) * eps}};
}
else if (std::is_same<OutType, cl_ulong>::value)
{ // to ulong
clamp_ranges=
{{outMin-0.5f, MAKE_HEX_DOUBLE(0x1.fffffffffffffp63, 0x1fffffffffffffLL, 11)},
{outMin-0.5f, MAKE_HEX_DOUBLE(0x1.fffffffffffffp63, 0x1fffffffffffffLL, 11)},
{outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps, MAKE_HEX_DOUBLE(0x1.fffffffffffffp63, 0x1fffffffffffffLL, 11)},
{outMin-0.0f, MAKE_HEX_DOUBLE(0x1.fffffffffffffp63, 0x1fffffffffffffLL, 11) },
{outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps, MAKE_HEX_DOUBLE(0x1.fffffffffffffp63, 0x1fffffffffffffLL, 11)}};
}
else if (std::is_same<OutType, cl_long>::value)
{ // to long
clamp_ranges=
{ {MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63), MAKE_HEX_DOUBLE(0x1.fffffffffffffp62, 0x1fffffffffffffLL, 10)},
{MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63), MAKE_HEX_DOUBLE(0x1.fffffffffffffp62, 0x1fffffffffffffLL, 10)},
{MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63), MAKE_HEX_DOUBLE(0x1.fffffffffffffp62, 0x1fffffffffffffLL, 10)},
{MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63), MAKE_HEX_DOUBLE(0x1.fffffffffffffp62, 0x1fffffffffffffLL, 10)},
{MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63), MAKE_HEX_DOUBLE(0x1.fffffffffffffp62, 0x1fffffffffffffLL, 10)}};
}
}
}
}
else if (is_in_half())
{
float outMin = static_cast<float>(ranges.first);
float outMax = static_cast<float>(ranges.second);
float eps = CL_HALF_EPSILON;
cl_half_rounding_mode prev_half_round = DataInitInfo::halfRoundingMode;
DataInitInfo::halfRoundingMode = CL_HALF_RTZ;
if (std::is_integral<OutType>::value)
{ // to char/uchar/short/ushort/int/uint/long/ulong
if (sizeof(OutType)<=sizeof(cl_char) || std::is_same<OutType, cl_short>::value)
{ // to char/uchar
clamp_ranges=
{{HFF(outMin-0.5f), HFF(outMax + 0.5f - outMax * 0.5f * eps)},
{HFF(outMin-0.5f), HFF(outMax + 0.5f - outMax * 0.5f * eps)},
{HFF(outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps), HFF(outMax-1.f)},
{HFF(outMin-0.0f), HFF(outMax - outMax * 0.5f * eps) },
{HFF(outMin-1.0f+(std::is_signed<OutType>::value?outMax:0.5f)*eps), HFF(outMax - outMax * 0.5f * eps)}};
}
else
{ // to ushort/int/uint/long/ulong
if (std::is_signed<OutType>::value)
{
clamp_ranges=
{ {HFF(-CL_HALF_MAX), HFF(CL_HALF_MAX)},
{HFF(-CL_HALF_MAX), HFF(CL_HALF_MAX)},
{HFF(-CL_HALF_MAX), HFF(CL_HALF_MAX)},
{HFF(-CL_HALF_MAX), HFF(CL_HALF_MAX)},
{HFF(-CL_HALF_MAX), HFF(CL_HALF_MAX)}};
}
else
{
clamp_ranges=
{ {HFF(outMin), HFF(CL_HALF_MAX)},
{HFF(outMin), HFF(CL_HALF_MAX)},
{HFF(outMin), HFF(CL_HALF_MAX)},
{HFF(outMin), HFF(CL_HALF_MAX)},
{HFF(outMin), HFF(CL_HALF_MAX)}};
}
}
}
DataInitInfo::halfRoundingMode = prev_half_round;
}
// clang-format on
}
template <typename InType, typename OutType, bool InFP, bool OutFP>
float DataInfoSpec<InType, OutType, InFP, OutFP>::round_to_int(float f)
{
static const float magic[2] = { MAKE_HEX_FLOAT(0x1.0p23f, 0x1, 23),
-MAKE_HEX_FLOAT(0x1.0p23f, 0x1, 23) };
// Round fractional values to integer in round towards nearest mode
if (fabsf(f) < MAKE_HEX_FLOAT(0x1.0p23f, 0x1, 23))
{
volatile float x = f;
float magicVal = magic[f < 0];
#if defined(__SSE__)
// Defeat x87 based arithmetic, which cant do FTZ, and will round this
// incorrectly
__m128 v = _mm_set_ss(x);
__m128 m = _mm_set_ss(magicVal);
v = _mm_add_ss(v, m);
v = _mm_sub_ss(v, m);
_mm_store_ss((float *)&x, v);
#else
x += magicVal;
x -= magicVal;
#endif
f = x;
}
return f;
}
template <typename InType, typename OutType, bool InFP, bool OutFP>
long long
DataInfoSpec<InType, OutType, InFP, OutFP>::round_to_int_and_clamp(double f)
{
static const double magic[2] = { MAKE_HEX_DOUBLE(0x1.0p52, 0x1LL, 52),
MAKE_HEX_DOUBLE(-0x1.0p52, -0x1LL, 52) };
if (f >= -(double)LLONG_MIN) return LLONG_MAX;
if (f <= (double)LLONG_MIN) return LLONG_MIN;
// Round fractional values to integer in round towards nearest mode
if (fabs(f) < MAKE_HEX_DOUBLE(0x1.0p52, 0x1LL, 52))
{
volatile double x = f;
double magicVal = magic[f < 0];
#if defined(__SSE2__) || defined(_MSC_VER)
// Defeat x87 based arithmetic, which cant do FTZ, and will round this
// incorrectly
__m128d v = _mm_set_sd(x);
__m128d m = _mm_set_sd(magicVal);
v = _mm_add_sd(v, m);
v = _mm_sub_sd(v, m);
_mm_store_sd((double *)&x, v);
#else
x += magicVal;
x -= magicVal;
#endif
f = x;
}
return (long long)f;
}
template <typename InType, typename OutType, bool InFP, bool OutFP>
OutType DataInfoSpec<InType, OutType, InFP, OutFP>::absolute(const OutType &x)
{
union {
cl_uint u;
OutType f;
} u;
u.f = x;
if (std::is_same<OutType, float>::value)
u.u &= 0x7fffffff;
else if (std::is_same<OutType, double>::value)
u.u &= 0x7fffffffffffffffULL;
else
log_error("Unexpected argument type of DataInfoSpec::absolute");
return u.f;
}
template <typename T, bool fp> constexpr bool is_half()
{
return (std::is_same<cl_half, T>::value && fp);
}
template <typename InType, typename OutType, bool InFP, bool OutFP>
void DataInfoSpec<InType, OutType, InFP, OutFP>::conv(OutType *out, InType *in)
{
if (std::is_same<cl_float, InType>::value || is_in_half())
{
cl_float inVal = *in;
if (std::is_same<cl_half, InType>::value)
{
inVal = HTF(*in);
}
if (std::is_floating_point<OutType>::value)
{
*out = (OutType)inVal;
}
else if (is_out_half())
{
*out = HFF(*in);
}
else if (std::is_same<cl_ulong, OutType>::value)
{
#if defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))
// VS2005 (at least) on x86 uses fistp to store the float as a
// 64-bit int. However, fistp stores it as a signed int, and some of
// the test values won't fit into a signed int. (These test values
// are >= 2^63.) The result on VS2005 is that these end up silently
// (at least by default settings) clamped to the max lowest ulong.
cl_float x = round_to_int(inVal);
if (x >= 9223372036854775808.0f)
{
x -= 9223372036854775808.0f;
((cl_ulong *)out)[0] = x;
((cl_ulong *)out)[0] += 9223372036854775808ULL;
}
else
{
((cl_ulong *)out)[0] = x;
}
#else
*out = round_to_int(inVal);
#endif
}
else if (std::is_same<cl_long, OutType>::value)
{
*out = round_to_int_and_clamp(inVal);
}
else
*out = round_to_int(inVal);
}
else if (std::is_same<cl_double, InType>::value)
{
if (std::is_same<cl_float, OutType>::value)
*out = (OutType)*in;
else if (is_out_half())
*out = static_cast<OutType>(HFD(*in));
else
*out = rint(*in);
}
else if (std::is_same<cl_ulong, InType>::value
|| std::is_same<cl_long, InType>::value)
{
if (std::is_same<cl_double, OutType>::value)
{
#if defined(_MSC_VER)
double result;
if (std::is_same<cl_ulong, InType>::value)
{
cl_ulong l = ((cl_ulong *)in)[0];
cl_long sl = ((cl_long)l < 0) ? (cl_long)((l >> 1) | (l & 1))
: (cl_long)l;
#if defined(_M_X64)
_mm_store_sd(&result, _mm_cvtsi64_sd(_mm_setzero_pd(), sl));
#else
result = sl;
#endif
((double *)out)[0] =
(l == 0 ? 0.0 : (((cl_long)l < 0) ? result * 2.0 : result));
}
else
{
cl_long l = ((cl_long *)in)[0];
#if defined(_M_X64)
_mm_store_sd(&result, _mm_cvtsi64_sd(_mm_setzero_pd(), l));
#else
result = l;
#endif
((double *)out)[0] =
(l == 0 ? 0.0 : result); // Per IEEE-754-2008 5.4.1, 0's
// always convert to +0.0
}
#else
// Use volatile to prevent optimization by Clang compiler
volatile InType vi = *in;
*out = (vi == 0 ? 0.0 : static_cast<OutType>(vi));
#endif
}
else if (std::is_same<cl_float, OutType>::value || is_out_half())
{
cl_float outVal = 0.f;
#if defined(_MSC_VER) && defined(_M_X64)
float result;
if (std::is_same<cl_ulong, InType>::value)
{
cl_ulong l = ((cl_ulong *)in)[0];
cl_long sl = ((cl_long)l < 0) ? (cl_long)((l >> 1) | (l & 1))
: (cl_long)l;
_mm_store_ss(&result, _mm_cvtsi64_ss(_mm_setzero_ps(), sl));
outVal = (l == 0 ? 0.0f
: (((cl_long)l < 0) ? result * 2.0f : result));
}
else
{
cl_long l = ((cl_long *)in)[0];
_mm_store_ss(&result, _mm_cvtsi64_ss(_mm_setzero_ps(), l));
outVal = (l == 0 ? 0.0f : result); // Per IEEE-754-2008 5.4.1,
// 0's always convert to +0.0
}
#else
InType l = ((InType *)in)[0];
#if (defined(__arm__) || defined(__aarch64__)) && defined(__GNUC__)
/* ARM VFP doesn't have hardware instruction for converting from
* 64-bit integer to float types, hence GCC ARM uses the
* floating-point emulation code despite which -mfloat-abi setting
* it is. But the emulation code in libgcc.a has only one rounding
* mode (round to nearest even in this case) and ignores the user
* rounding mode setting in hardware. As a result setting rounding
* modes in hardware won't give correct rounding results for type
* covert from 64-bit integer to float using GCC for ARM compiler so
* for testing different rounding modes, we need to use alternative
* reference function. ARM64 does have an instruction, however we
* cannot guarantee the compiler will use it. On all ARM
* architechures use emulation to calculate reference.*/
if (std::is_same<cl_ulong, InType>::value)
outVal = qcom_u64_2_f32(l, qcom_sat, qcom_rm);
else
outVal = (l == 0 ? 0.0f : qcom_s64_2_f32(l, qcom_sat, qcom_rm));
#else
outVal = (l == 0 ? 0.0f : (float)l); // Per IEEE-754-2008 5.4.1, 0's
// always convert to +0.0
#endif
#endif
*out = std::is_same<cl_half, OutType>::value
? static_cast<OutType>(HFF(outVal))
: outVal;
}
else
{
*out = (OutType)*in;
}
}
else
{
if (std::is_same<cl_float, OutType>::value)
{
// Use volatile to prevent optimization by Clang compiler
volatile InType vi = *in;
// Per IEEE-754-2008 5.4.1, 0 always converts to +0.0
*out = (vi == 0 ? 0.0f : vi);
}
else if (std::is_same<cl_double, OutType>::value)
{
// Per IEEE-754-2008 5.4.1, 0 always converts to +0.0
*out = (*in == 0 ? 0.0 : *in);
}
else if (is_out_half())
*out = static_cast<OutType>(HFF(*in == 0 ? 0.f : *in));
else
{
*out = (OutType)*in;
}
}
}
#define CLAMP(_lo, _x, _hi) \
((_x) < (_lo) ? (_lo) : ((_x) > (_hi) ? (_hi) : (_x)))
template <typename InType, typename OutType, bool InFP, bool OutFP>
void DataInfoSpec<InType, OutType, InFP, OutFP>::conv_sat(OutType *out,
InType *in)
{
if (std::is_floating_point<InType>::value || is_in_half())
{
cl_float inVal = *in;
if (is_in_half()) inVal = HTF(*in);
if (std::is_floating_point<OutType>::value || is_out_half())
{ // in half/float/double, out half/float/double
if (is_out_half())
*out = static_cast<OutType>(HFF(inVal));
else
*out = (OutType)(is_in_half() ? inVal : *in);
}
else if ((std::is_same<InType, cl_float>::value || is_in_half())
&& std::is_same<cl_ulong, OutType>::value)
{
cl_float x = round_to_int(is_in_half() ? HTF(*in) : *in);
#if defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))
// VS2005 (at least) on x86 uses fistp to store the float as a
// 64-bit int. However, fistp stores it as a signed int, and some of
// the test values won't fit into a signed int. (These test values
// are >= 2^63.) The result on VS2005 is that these end up silently
// (at least by default settings) clamped to the max lowest ulong.
if (x >= 18446744073709551616.0f)
{ // 2^64
*out = 0xFFFFFFFFFFFFFFFFULL;
}
else if (x < 0)
{
*out = 0;
}
else if (x >= 9223372036854775808.0f)
{ // 2^63
x -= 9223372036854775808.0f;
*out = x;
*out += 9223372036854775808ULL;
}
else
{
*out = x;
}
#else
*out = x >= MAKE_HEX_DOUBLE(0x1.0p64, 0x1LL, 64)
? (OutType)0xFFFFFFFFFFFFFFFFULL
: x < 0 ? 0
: (OutType)x;
#endif
}
else if ((std::is_same<InType, cl_float>::value || is_in_half())
&& std::is_same<cl_long, OutType>::value)
{
cl_float f = round_to_int(is_in_half() ? HTF(*in) : *in);
*out = f >= MAKE_HEX_DOUBLE(0x1.0p63, 0x1LL, 63)
? (OutType)0x7FFFFFFFFFFFFFFFULL
: f < MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63)
? (OutType)0x8000000000000000LL
: (OutType)f;
}
else if (std::is_same<InType, cl_double>::value
&& std::is_same<cl_ulong, OutType>::value)
{
InType f = rint(*in);
*out = f >= MAKE_HEX_DOUBLE(0x1.0p64, 0x1LL, 64)
? (OutType)0xFFFFFFFFFFFFFFFFULL
: f < 0 ? 0
: (OutType)f;
}
else if (std::is_same<InType, cl_double>::value
&& std::is_same<cl_long, OutType>::value)
{
InType f = rint(*in);
*out = f >= MAKE_HEX_DOUBLE(0x1.0p63, 0x1LL, 63)
? (OutType)0x7FFFFFFFFFFFFFFFULL
: f < MAKE_HEX_DOUBLE(-0x1.0p63, -0x1LL, 63)
? (OutType)0x8000000000000000LL
: (OutType)f;
}
else
{ // in half/float/double, out char/uchar/short/ushort/int/uint
*out = CLAMP(ranges.first,
round_to_int_and_clamp(is_in_half() ? inVal : *in),
ranges.second);
}
}
else if (std::is_integral<InType>::value
&& std::is_integral<OutType>::value)
{
if (is_out_half())
{
*out = std::is_signed<InType>::value
? static_cast<OutType>(HFF((cl_float)*in))
: absolute(static_cast<OutType>(HFF((cl_float)*in)));
}
else
{
if ((std::is_signed<InType>::value
&& std::is_signed<OutType>::value)
|| (!std::is_signed<InType>::value
&& !std::is_signed<OutType>::value))
{
if (sizeof(InType) <= sizeof(OutType))
{
*out = (OutType)*in;
}
else
{
*out = CLAMP(ranges.first, *in, ranges.second);
}
}
else
{ // mixed signed/unsigned types
if (sizeof(InType) < sizeof(OutType))
{
*out = (!std::is_signed<InType>::value)
? (OutType)*in
: CLAMP(0, *in, ranges.second); // *in < 0 ? 0 : *in
}
else
{ // bigger/equal mixed signed/unsigned types - always clamp
*out = CLAMP(0, *in, ranges.second);
}
}
}
}
else
{ // InType integral, OutType floating
*out = std::is_signed<InType>::value ? (OutType)*in
: absolute((OutType)*in);
}
}
template <typename InType, typename OutType, bool InFP, bool OutFP>
void DataInfoSpec<InType, OutType, InFP, OutFP>::init(const cl_uint &job_id,
const cl_uint &thread_id)
{
uint64_t ulStart = start;
void *pIn = (char *)gIn + job_id * size * gTypeSizes[inType];
if (is_in_half())
{
cl_half *o = (cl_half *)pIn;
int i;
if (gIsEmbedded)
for (i = 0; i < size; i++)
o[i] = (cl_half)genrand_int32(mdv[thread_id]);
else
for (i = 0; i < size; i++) o[i] = (cl_half)((i + ulStart) % 0xffff);
if (0 == ulStart)
{
size_t tableSize = specialValuesHalf.size()
* sizeof(decltype(specialValuesHalf)::value_type);
if (sizeof(InType) * size < tableSize)
tableSize = sizeof(InType) * size;
memcpy((char *)(o + i) - tableSize, &specialValuesHalf.front(),
tableSize);
}
if (kUnsaturated == sat)
{
for (i = 0; i < size; i++) o[i] = clamp(o[i]);
}
}
else if (std::is_integral<InType>::value)
{
InType *o = (InType *)pIn;
if (sizeof(InType) <= sizeof(cl_short))
{ // char/uchar/ushort/short
for (int i = 0; i < size; i++) o[i] = ulStart++;
}
else if (sizeof(InType) <= sizeof(cl_int))
{ // int/uint
int i = 0;
if (gIsEmbedded)
for (i = 0; i < size; i++)
o[i] = (InType)genrand_int32(mdv[thread_id]);
else
for (i = 0; i < size; i++) o[i] = (InType)i + ulStart;
if (0 == ulStart)
{
size_t tableSize = specialValuesUInt.size()
* sizeof(decltype(specialValuesUInt)::value_type);
if (sizeof(InType) * size < tableSize)
tableSize = sizeof(InType) * size;
memcpy((char *)(o + i) - tableSize, &specialValuesUInt.front(),
tableSize);
}
}
else
{ // long/ulong
cl_ulong *o = (cl_ulong *)pIn;
cl_ulong i, j, k;
i = 0;
if (ulStart == 0)
{
// Try various powers of two
for (j = 0; j < (cl_ulong)size && j < 8 * sizeof(cl_ulong); j++)
o[j] = (cl_ulong)1 << j;
i = j;
// try the complement of those
for (j = 0; i < (cl_ulong)size && j < 8 * sizeof(cl_ulong); j++)
o[i++] = ~((cl_ulong)1 << j);
// Try various negative powers of two
for (j = 0; i < (cl_ulong)size && j < 8 * sizeof(cl_ulong); j++)
o[i++] = (cl_ulong)0xFFFFFFFFFFFFFFFEULL << j;
// try various powers of two plus 1, shifted by various amounts
for (j = 0; i < (cl_ulong)size && j < 8 * sizeof(cl_ulong); j++)
for (k = 0;
i < (cl_ulong)size && k < 8 * sizeof(cl_ulong) - j;
k++)
o[i++] = (((cl_ulong)1 << j) + 1) << k;
// try various powers of two minus 1
for (j = 0; i < (cl_ulong)size && j < 8 * sizeof(cl_ulong); j++)
for (k = 0;
i < (cl_ulong)size && k < 8 * sizeof(cl_ulong) - j;
k++)
o[i++] = (((cl_ulong)1 << j) - 1) << k;
// Other patterns
cl_ulong pattern[] = {
0x3333333333333333ULL, 0x5555555555555555ULL,
0x9999999999999999ULL, 0x6666666666666666ULL,
0xccccccccccccccccULL, 0xaaaaaaaaaaaaaaaaULL
};
cl_ulong mask[] = { 0xffffffffffffffffULL,
0xff00ff00ff00ff00ULL,
0xffff0000ffff0000ULL,
0xffffffff00000000ULL };
for (j = 0; i < (cl_ulong)size
&& j < sizeof(pattern) / sizeof(pattern[0]);
j++)
for (k = 0; i + 2 <= (cl_ulong)size
&& k < sizeof(mask) / sizeof(mask[0]);
k++)
{
o[i++] = pattern[j] & mask[k];
o[i++] = pattern[j] & ~mask[k];
}
}
auto &md = mdv[thread_id];
for (; i < (cl_ulong)size; i++)
o[i] = (cl_ulong)genrand_int32(md)
| ((cl_ulong)genrand_int32(md) << 32);
}
} // integrals
else if (std::is_same<InType, cl_float>::value)
{
cl_uint *o = (cl_uint *)pIn;
int i;
if (gIsEmbedded)
for (i = 0; i < size; i++)
o[i] = (cl_uint)genrand_int32(mdv[thread_id]);
else
for (i = 0; i < size; i++) o[i] = (cl_uint)i + ulStart;
if (0 == ulStart)
{
size_t tableSize = specialValuesFloat.size()
* sizeof(decltype(specialValuesFloat)::value_type);
if (sizeof(InType) * size < tableSize)
tableSize = sizeof(InType) * size;
memcpy((char *)(o + i) - tableSize, &specialValuesFloat.front(),
tableSize);
}
if (kUnsaturated == sat)
{
InType *f = (InType *)pIn;
for (i = 0; i < size; i++) f[i] = clamp(f[i]);
}
}
else if (std::is_same<InType, cl_double>::value)
{
InType *o = (InType *)pIn;
int i = 0;
union {
uint64_t u;
InType d;
} u;
for (i = 0; i < size; i++)
{
uint64_t z = i + ulStart;
uint32_t bits = ((uint32_t)z ^ (uint32_t)(z >> 32));
// split 0x89abcdef to 0x89abc00000000def
u.u = bits & 0xfffU;
u.u |= (uint64_t)(bits & ~0xfffU) << 32;
// sign extend the leading bit of def segment as sign bit so that
// the middle region consists of either all 1s or 0s
u.u -= (bits & 0x800U) << 1;
o[i] = u.d;
}
if (0 == ulStart)
{
size_t tableSize = specialValuesDouble.size()
* sizeof(decltype(specialValuesDouble)::value_type);
if (sizeof(InType) * size < tableSize)
tableSize = sizeof(InType) * size;
memcpy((char *)(o + i) - tableSize, &specialValuesDouble.front(),
tableSize);
}
if (0 == sat)
for (i = 0; i < size; i++) o[i] = clamp(o[i]);
}
}
template <typename InType, typename OutType, bool InFP, bool OutFP>
InType DataInfoSpec<InType, OutType, InFP, OutFP>::clamp(const InType &in)
{
if (std::is_integral<OutType>::value && !OutFP)
{
if (std::is_same<InType, cl_float>::value)
{
return fclamp(clamp_ranges[round].first, in,
clamp_ranges[round].second);
}
else if (std::is_same<InType, cl_double>::value)
{
return dclamp(clamp_ranges[round].first, in,
clamp_ranges[round].second);
}
else if (std::is_same<InType, cl_half>::value && InFP)
{
return HFF(fclamp(HTF(clamp_ranges[round].first), HTF(in),
HTF(clamp_ranges[round].second)));
}
}
return in;
}
#endif /* CONVERSIONS_DATA_INFO_H */