Files
OpenCL-CTS/test_conformance/vulkan/test_vulkan_interop_image.cpp
Ben Ashbaugh 620c689919 update fp16 staging branch from main (#1903)
* allocations: Move results array from stack to heap (#1857)

* allocations: Fix stack overflow

* check format fixes

* Fix windows stack overflow. (#1839)

* thread_dimensions: Avoid combinations of very small LWS and very large GWS (#1856)

Modify the existing condition to include extremely small LWS like
1x1 on large GWS values

* c11_atomics: Reduce the loopcounter for sequential consistency tests (#1853)

Reduce the loop from 1000000 to 500000 since the former value
makes the test run too long and cause system issues on certain
platforms

* Limit individual allocation size using the global memory size (#1835)

Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>

* geometrics: fix Wsign-compare warnings (#1855)

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* integer_ops: fix -Wformat warnings (#1860)

The main sources of warnings were:

 * Printing of a `size_t` which requires the `%zu` specifier.

 * Printing of `cl_long`/`cl_ulong` which is now done using the
   `PRI*64` macros to ensure portability across 32 and 64-bit builds.

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Replace OBSOLETE_FORAMT with OBSOLETE_FORMAT (#1776)

* Replace OBSOLETE_FORAMT with OBSOLETE_FORMAT

In imageHelpers.cpp and few other places in image tests, OBSOLETE_FORMAT is misspelled as OBSOLETE_FORAMT.
Fix misspelling by replcaing it with OBSOLETE_FORMAT.

Fixes #1769

* Remove code guarded by OBSOLETE_FORMAT

Remove code guarded by OBSOLETE_FORMAT
as suggested by review comments

Fixes #1769

* Fix formating issues for OBSOLETE_FORMAT changes

Fix formatting issues observed in files while removing
code guarded by OBSOLETE_FORMAT

Fixes #1769

* Some more formatting fixes

Some more formatting fixes to get CI clean

Fixes #1769

* Final Formating fixes

Final formatting fixes for #1769

* Enhancement: Thread dimensions user parameters (#1384)

* Fix format in the test scope

* Add user params to limit testing

Add parameters to reduce amount of testing.
Helpful for debugging or for machines with lower performance.

* Restore default value

* Print info only if testing params bigger than 0.

* [NFC] conversions: reenable Wunused-but-set-variable (#1845)

Remove an assigned-to but unused variable.

Reenable the Wunused-but-set-variable warning for the conversions
suite, as it now compiles cleanly with this warning enabled.

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Fix bug of conversion from long to double (#1847)

* Fix bug of conversion from long to double

It the input is long type, it should be load as long type, not ulong.

* update long2float

* math_brute_force: fix exp/exp2 rlx ULP calculation (#1848)

Fix the ULP error calculation for the `exp` and `exp2` builtins in
relaxed math mode for the full profile.

Previously, the `ulps` value kept being added to while verifying the
result buffer in a loop.  `ulps` could even become a `NaN` when the
input argument being tested was a `NaN`.

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Enable LARGEADDRESSAWARE for 32 bit compilation (#1858)

* Enable LARGEADDRESSAWARE for 32 bit compilation

32-bit executables built with MSVC linker have only 2GB virtual memory
address space by default, which might not be sufficient for some tests.

Enable LARGEADDRESSAWARE linker flag for 32-bit targets to allow tests
to handle addresses larger than 2 gigabytes.

https://learn.microsoft.com/en-us/cpp/build/reference/largeaddressaware-handle-large-addresses?view=msvc-170

Signed-off-by: Guo, Yilong <yilong.guo@intel.com>

* Apply suggestion

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

---------

Signed-off-by: Guo, Yilong <yilong.guo@intel.com>
Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* fix return code when readwrite image is not supported (#1873)

This function (do_test) starts by testing write and read individually.
Both of them can have errors.

When readwrite image is not supported, the function returns
TEST_SKIPPED_ITSELF potentially masking errors leading to the test
returning EXIT_SUCCESS even with errors along the way.

* fix macos builds by avoiding double compilation of function_list.cpp for test_spir (#1866)

* modernize CMakeLists for test_spir

* add the operating system release to the sccache key

* include the math brute force function list vs. building it twice

* fix the license header on the spirv-new tests (#1865)

The source files for the spirv-new tests were using the older Khronos
license instead of the proper Apache license.  Fixed the license in
all source files.

* compiler: fix grammar in error message (#1877)

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Updated semaphore tests to use clSemaphoreReImportSyncFdKHR. (#1854)

* Updated semaphore tests to use clSemaphoreReImportSyncFdKHR.

Additionally updated common semaphore code to handle spec updates
that restrict simultaneous importing/exporting of handles.

* Fix build issues on CI

* gcc build issues

* Make clReImportSemaphoreSyncFdKHR a required API
call if cl_khr_external_semaphore_sync_fd is present.

* Implement signal and wait for all semaphore types.

* subgroups: fix for testing too large WG sizes (#1620)

It seemed to be a typo; the comment says that it
tries to fetch local size for a subgroup count with
above max WG size, but it just used the previous
subgroup count.

The test on purpose sets a SG count to be a larger
number than the max work-items in the work group.
Given the minimum SG size is 1 WI, it means that there
can be a maximum of maximum work-group size of SGs (of
1 WI of size). Thus, if we request a number of SGs that
exceeds the local size, the query should fail as expected.

* add SPIR-V version testing (#1861)

* basic SPIR-V 1.3 testing support

* updated script to compile for more SPIR-V versions

* switch to general SPIR-V versions test

* update copyright text and fix license

* improve output while test is running

* check for higher SPIR-V versions first

* fix formatting

* fix the reported platform information for math brute force (#1884)

When the math brute force test printed the platform version it always
printed information for the first platform in the system, which could
be different than the platform for the passed-in device.  Fixed by
querying the platform from the passed-in device instead.

* api tests fix: Use MTdataHolder in test_get_image_info (#1871)

* Minor fixes in mutable dispatch tests. (#1829)

* Minor fixes in mutable dispatch tests.

* Fix size of newWrapper in MutableDispatchSVMArguments.
* Fix errnoneus clCommandNDRangeKernelKHR call.

Signed-off-by: John Kesapides <john.kesapides@arm.com>

* * Set the row_pitch for imageInfo in MutableDispatchImage1DArguments
and MutableDispatchImage2DArguments. The row_pitch is
used by get_image_size() to calculate the size of
the host pointers by generate_random_image_data.

Signed-off-by: John Kesapides <john.kesapides@arm.com>

---------

Signed-off-by: John Kesapides <john.kesapides@arm.com>

* add test for cl_khr_spirv_linkonce_odr (#1226)

* initial version of the test with placeholders for linkonce_odr linkage

* add OpExtension SPV_KHR_linkonce_odr extension

* add check for extension

* switch to actual LinkOnceODR linkage

* fix formatting

* add a test case to ensure a function with linkonce_odr is exported

* add back the extension check

* fix formatting

* undo compiler optimization and actually add the call to function a

* [NFC] subgroups: remove unnecessary extern keywords (#1892)

In C and C++ all functions have external linkage by default.

Also remove the unused `gMTdata` and `test_pipe_functions`
declarations.

Fixes https://github.com/KhronosGroup/OpenCL-CTS/issues/1137

Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>

* Added cl_khr_fp16 extension support for test_decorate from spirv_new (#1770)

* Added cl_khr_fp16 extension support for test_decorate from spirv_new, work in progres

* Complemented test_decorate saturation test to support cl_khr_fp16 extension (issue #142)

* Fixed clang format

* scope of modifications:

-changed naming convention of saturation .spvasm files related to
test_decorate of spirv_new
-restored float to char/uchar saturation tests
-few minor corrections

* fix ranges for half testing

* fix formating

* one more formatting fix

* remove unused function

* use isnan instead of std::isnan

isnan is currently implemented as a macro, not as a function, so
we can't use std::isnan.

* fix Clang warning about inexact conversion

---------

Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com>

* add support for custom devices (#1891)

enable the CTS to run on custom devices

---------

Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Signed-off-by: Guo, Yilong <yilong.guo@intel.com>
Signed-off-by: John Kesapides <john.kesapides@arm.com>
Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com>
Co-authored-by: Haonan Yang <haonan.yang@intel.com>
Co-authored-by: Ahmed Hesham <117350656+ahesham-arm@users.noreply.github.com>
Co-authored-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
Co-authored-by: niranjanjoshi121 <43807392+niranjanjoshi121@users.noreply.github.com>
Co-authored-by: Grzegorz Wawiorko <grzegorz.wawiorko@intel.com>
Co-authored-by: Wenwan Xing <wenwan.xing@intel.com>
Co-authored-by: Yilong Guo <yilong.guo@intel.com>
Co-authored-by: Romaric Jodin <89833130+rjodinchr@users.noreply.github.com>
Co-authored-by: joshqti <127994991+joshqti@users.noreply.github.com>
Co-authored-by: Pekka Jääskeläinen <pekka.jaaskelainen@tuni.fi>
Co-authored-by: imilenkovic00 <155085410+imilenkovic00@users.noreply.github.com>
Co-authored-by: John Kesapides <46718829+JohnKesapidesARM@users.noreply.github.com>
Co-authored-by: Marcin Hajder <marcin.hajder@gmail.com>
Co-authored-by: Aharon Abramson <aharon.abramson@mobileye.com>
2024-03-02 16:48:45 -08:00

1542 lines
76 KiB
C++

//
// Copyright (c) 2022 The Khronos Group Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
#include <vulkan_interop_common.hpp>
#include <string>
#include "harness/errorHelpers.h"
#include <algorithm>
#include "deviceInfo.h"
#define MAX_2D_IMAGES 5
#define MAX_2D_IMAGE_WIDTH 1024
#define MAX_2D_IMAGE_HEIGHT 1024
#define MAX_2D_IMAGE_ELEMENT_SIZE 16
#define MAX_2D_IMAGE_MIP_LEVELS 11
#define MAX_2D_IMAGE_DESCRIPTORS MAX_2D_IMAGES *MAX_2D_IMAGE_MIP_LEVELS
#define NUM_THREADS_PER_GROUP_X 32
#define NUM_THREADS_PER_GROUP_Y 32
#define NUM_BLOCKS(size, blockSize) \
(ROUND_UP((size), (blockSize)) / (blockSize))
#define ASSERT(x) \
if (!(x)) \
{ \
fprintf(stderr, "Assertion \"%s\" failed at %s:%d\n", #x, __FILE__, \
__LINE__); \
exit(1); \
}
#define ASSERT_LEQ(x, y) \
if (x > y) \
{ \
ASSERT(0); \
}
namespace {
struct Params
{
uint32_t numImage2DDescriptors;
};
}
static cl_uchar uuid[CL_UUID_SIZE_KHR];
static cl_device_id deviceId = NULL;
size_t max_width = MAX_2D_IMAGE_WIDTH;
size_t max_height = MAX_2D_IMAGE_HEIGHT;
const char *kernel_text_numImage_1 = " \
__constant sampler_t smpImg = CLK_NORMALIZED_COORDS_FALSE|CLK_ADDRESS_NONE|CLK_FILTER_NEAREST;\n\
__kernel void image2DKernel(read_only image2d_t InputImage, write_only image2d_t OutImage, int num2DImages, int baseWidth, int baseHeight, int numMipLevels)\n\
{\n\
int threadIdxX = get_global_id(0);\n\
int threadIdxY = get_global_id(1);\n\
int numThreadsX = get_global_size(0); \n\
int numThreadsY = get_global_size(1);\n\
if (threadIdxX >= baseWidth || threadIdxY >= baseHeight)\n\
{\n\
return;\n\
}\n\
%s dataA = read_image%s(InputImage, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataB = read_image%s(InputImage, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
write_image%s(OutImage, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataA);\n\
write_image%s(OutImage, (int2)( threadIdxX, threadIdxY), dataB);\n\
\n\
}";
const char *kernel_text_numImage_2 = " \
__constant sampler_t smpImg = CLK_NORMALIZED_COORDS_FALSE|CLK_ADDRESS_NONE|CLK_FILTER_NEAREST;\n\
__kernel void image2DKernel(read_only image2d_t InputImage_1, write_only image2d_t OutImage_1, read_only image2d_t InputImage_2,write_only image2d_t OutImage_2,int num2DImages, int baseWidth, int baseHeight, int numMipLevels) \n\
{\n\
int threadIdxX = get_global_id(0);\n\
int threadIdxY = get_global_id(1);\n\
int numThreadsX = get_global_size(0);\n\
int numThreadsY = get_global_size(1);\n\
if (threadIdxX >= baseWidth || threadIdxY >= baseHeight) \n\
{\n\
return;\n\
}\n\
%s dataA = read_image%s(InputImage_1, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataB = read_image%s(InputImage_1, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
%s dataC = read_image%s(InputImage_2, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataD = read_image%s(InputImage_2, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
write_image%s(OutImage_1, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataA);\n\
write_image%s(OutImage_1, (int2)(threadIdxX, threadIdxY), dataB);\n\
write_image%s(OutImage_2, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataC);\n\
write_image%s(OutImage_2, (int2)(threadIdxX, threadIdxY), dataD);\n\
\n\
}";
const char *kernel_text_numImage_4 = " \
__constant sampler_t smpImg = CLK_NORMALIZED_COORDS_FALSE|CLK_ADDRESS_NONE|CLK_FILTER_NEAREST;\n\
__kernel void image2DKernel(read_only image2d_t InputImage_1, write_only image2d_t OutImage_1, read_only image2d_t InputImage_2, write_only image2d_t OutImage_2, read_only image2d_t InputImage_3, write_only image2d_t OutImage_3, read_only image2d_t InputImage_4, write_only image2d_t OutImage_4, int num2DImages, int baseWidth, int baseHeight, int numMipLevels) \n\
{\n\
int threadIdxX = get_global_id(0);\n\
int threadIdxY = get_global_id(1);\n\
int numThreadsX = get_global_size(0);\n\
int numThreadsY = get_global_size(1);\n\
if (threadIdxX >= baseWidth || threadIdxY >= baseHeight) \n\
{\n\
return;\n\
}\n\
%s dataA = read_image%s(InputImage_1, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataB = read_image%s(InputImage_1, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
%s dataC = read_image%s(InputImage_2, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataD = read_image%s(InputImage_2, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
%s dataE = read_image%s(InputImage_3, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataF = read_image%s(InputImage_3, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
%s dataG = read_image%s(InputImage_4, smpImg, (int2)(threadIdxX, threadIdxY)); \n\
%s dataH = read_image%s(InputImage_4, smpImg, (int2)(threadIdxX, baseHeight-threadIdxY-1)); \n\
write_image%s(OutImage_1, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataA);\n\
write_image%s(OutImage_1, (int2)(threadIdxX, threadIdxY), dataB);\n\
write_image%s(OutImage_2, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataC);\n\
write_image%s(OutImage_2, (int2)(threadIdxX, threadIdxY), dataD);\n\
write_image%s(OutImage_3, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataE);\n\
write_image%s(OutImage_3, (int2)(threadIdxX, threadIdxY), dataF);\n\
write_image%s(OutImage_4, (int2)(threadIdxX, baseHeight-threadIdxY-1), dataG);\n\
write_image%s(OutImage_4, (int2)(threadIdxX, threadIdxY), dataH);\n\
\n\
}";
const uint32_t num2DImagesList[] = { 1, 2, 4 };
const uint32_t widthList[] = { 4, 64, 183, 1024 };
const uint32_t heightList[] = { 4, 64, 365 };
const cl_kernel getKernelType(VulkanFormat format, cl_kernel kernel_float,
cl_kernel kernel_signed,
cl_kernel kernel_unsigned)
{
cl_kernel kernel;
switch (format)
{
case VULKAN_FORMAT_R32G32B32A32_SFLOAT: kernel = kernel_float; break;
case VULKAN_FORMAT_R32G32B32A32_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R32G32B32A32_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R16G16B16A16_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R16G16B16A16_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R8G8B8A8_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R8G8B8A8_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R32G32_SFLOAT: kernel = kernel_float; break;
case VULKAN_FORMAT_R32G32_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R32G32_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R16G16_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R16G16_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R8G8_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R8G8_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R32_SFLOAT: kernel = kernel_float; break;
case VULKAN_FORMAT_R32_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R32_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R16_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R16_SINT: kernel = kernel_signed; break;
case VULKAN_FORMAT_R8_UINT: kernel = kernel_unsigned; break;
case VULKAN_FORMAT_R8_SINT: kernel = kernel_signed; break;
default:
log_error(" Unsupported format");
ASSERT(0);
break;
}
return kernel;
}
int run_test_with_two_queue(
cl_context &context, cl_command_queue &cmd_queue1,
cl_command_queue &cmd_queue2, cl_kernel *kernel_unsigned,
cl_kernel *kernel_signed, cl_kernel *kernel_float, VulkanDevice &vkDevice,
VulkanExternalSemaphoreHandleType vkExternalSemaphoreHandleType)
{
cl_int err = CL_SUCCESS;
size_t origin[3] = { 0, 0, 0 };
size_t region[3] = { 1, 1, 1 };
cl_kernel updateKernelCQ1, updateKernelCQ2;
std::vector<VulkanFormat> vkFormatList = getSupportedVulkanFormatList();
const std::vector<VulkanExternalMemoryHandleType>
vkExternalMemoryHandleTypeList =
getSupportedVulkanExternalMemoryHandleTypeList();
char magicValue = 0;
VulkanBuffer vkParamsBuffer(vkDevice, sizeof(Params));
VulkanDeviceMemory vkParamsDeviceMemory(
vkDevice, vkParamsBuffer.getSize(),
getVulkanMemoryType(vkDevice,
VULKAN_MEMORY_TYPE_PROPERTY_HOST_VISIBLE_COHERENT));
vkParamsDeviceMemory.bindBuffer(vkParamsBuffer);
uint64_t maxImage2DSize =
max_width * max_height * MAX_2D_IMAGE_ELEMENT_SIZE * 2;
VulkanBuffer vkSrcBuffer(vkDevice, maxImage2DSize);
VulkanDeviceMemory vkSrcBufferDeviceMemory(
vkDevice, vkSrcBuffer.getSize(),
getVulkanMemoryType(vkDevice,
VULKAN_MEMORY_TYPE_PROPERTY_HOST_VISIBLE_COHERENT));
vkSrcBufferDeviceMemory.bindBuffer(vkSrcBuffer);
char *srcBufferPtr, *dstBufferPtr;
srcBufferPtr = (char *)malloc(maxImage2DSize);
dstBufferPtr = (char *)malloc(maxImage2DSize);
VulkanDescriptorSetLayoutBindingList vkDescriptorSetLayoutBindingList;
vkDescriptorSetLayoutBindingList.addBinding(
0, VULKAN_DESCRIPTOR_TYPE_STORAGE_BUFFER, 1);
vkDescriptorSetLayoutBindingList.addBinding(
1, VULKAN_DESCRIPTOR_TYPE_STORAGE_IMAGE, MAX_2D_IMAGE_DESCRIPTORS);
VulkanDescriptorSetLayout vkDescriptorSetLayout(
vkDevice, vkDescriptorSetLayoutBindingList);
VulkanPipelineLayout vkPipelineLayout(vkDevice, vkDescriptorSetLayout);
VulkanDescriptorPool vkDescriptorPool(vkDevice,
vkDescriptorSetLayoutBindingList);
VulkanDescriptorSet vkDescriptorSet(vkDevice, vkDescriptorPool,
vkDescriptorSetLayout);
VulkanCommandPool vkCommandPool(vkDevice);
VulkanCommandBuffer vkCopyCommandBuffer(vkDevice, vkCommandPool);
VulkanCommandBuffer vkShaderCommandBuffer(vkDevice, vkCommandPool);
VulkanQueue &vkQueue = vkDevice.getQueue();
VulkanSemaphore vkVk2CLSemaphore(vkDevice, vkExternalSemaphoreHandleType);
VulkanSemaphore vkCl2VkSemaphore(vkDevice, vkExternalSemaphoreHandleType);
clExternalSemaphore *clVk2CLExternalSemaphore = NULL;
clExternalSemaphore *clCl2VkExternalSemaphore = NULL;
clVk2CLExternalSemaphore = new clExternalImportableSemaphore(
vkVk2CLSemaphore, context, vkExternalSemaphoreHandleType, deviceId);
clCl2VkExternalSemaphore = new clExternalExportableSemaphore(
vkCl2VkSemaphore, context, vkExternalSemaphoreHandleType, deviceId);
std::vector<VulkanDeviceMemory *> vkImage2DListDeviceMemory1;
std::vector<VulkanDeviceMemory *> vkImage2DListDeviceMemory2;
std::vector<clExternalMemoryImage *> externalMemory1;
std::vector<clExternalMemoryImage *> externalMemory2;
std::vector<char> vkImage2DShader;
for (size_t fIdx = 0; fIdx < vkFormatList.size(); fIdx++)
{
VulkanFormat vkFormat = vkFormatList[fIdx];
log_info("Format: %d\n", vkFormat);
uint32_t elementSize = getVulkanFormatElementSize(vkFormat);
ASSERT_LEQ(elementSize, (uint32_t)MAX_2D_IMAGE_ELEMENT_SIZE);
log_info("elementSize= %d\n", elementSize);
std::string fileName = "image2D_"
+ std::string(getVulkanFormatGLSLFormat(vkFormat)) + ".spv";
log_info("Load %s file", fileName.c_str());
vkImage2DShader = readFile(fileName);
VulkanShaderModule vkImage2DShaderModule(vkDevice, vkImage2DShader);
VulkanComputePipeline vkComputePipeline(vkDevice, vkPipelineLayout,
vkImage2DShaderModule);
for (size_t wIdx = 0; wIdx < ARRAY_SIZE(widthList); wIdx++)
{
uint32_t width = widthList[wIdx];
log_info("Width: %d\n", width);
if (width > max_width) continue;
region[0] = width;
for (size_t hIdx = 0; hIdx < ARRAY_SIZE(heightList); hIdx++)
{
uint32_t height = heightList[hIdx];
log_info("Height: %d", height);
if (height > max_height) continue;
region[1] = height;
uint32_t numMipLevels = 1;
log_info("Number of mipmap levels: %d\n", numMipLevels);
magicValue++;
char *vkSrcBufferDeviceMemoryPtr =
(char *)vkSrcBufferDeviceMemory.map();
uint64_t srcBufSize = 0;
memset(vkSrcBufferDeviceMemoryPtr, 0, maxImage2DSize);
memset(srcBufferPtr, 0, maxImage2DSize);
uint32_t mipLevel = 0;
for (uint32_t row = 0;
row < std::max(height >> mipLevel, uint32_t(1)); row++)
{
for (uint32_t col = 0;
col < std::max(width >> mipLevel, uint32_t(1)); col++)
{
for (uint32_t elementByte = 0;
elementByte < elementSize; elementByte++)
{
vkSrcBufferDeviceMemoryPtr[srcBufSize] =
(char)(magicValue + mipLevel + row + col);
srcBufferPtr[srcBufSize] =
(char)(magicValue + mipLevel + row + col);
srcBufSize++;
}
}
}
srcBufSize = ROUND_UP(
srcBufSize,
std::max(
elementSize,
(uint32_t)VULKAN_MIN_BUFFER_OFFSET_COPY_ALIGNMENT));
vkSrcBufferDeviceMemory.unmap();
for (size_t niIdx = 0; niIdx < ARRAY_SIZE(num2DImagesList);
niIdx++)
{
uint32_t num2DImages = num2DImagesList[niIdx] + 1;
// added one image for cross-cq case for updateKernelCQ2
log_info("Number of images: %d\n", num2DImages);
ASSERT_LEQ(num2DImages, (uint32_t)MAX_2D_IMAGES);
uint32_t num_2D_image;
if (useSingleImageKernel)
{
num_2D_image = 1;
}
else
{
num_2D_image = num2DImages;
}
Params *params = (Params *)vkParamsDeviceMemory.map();
params->numImage2DDescriptors = num_2D_image * numMipLevels;
vkParamsDeviceMemory.unmap();
vkDescriptorSet.update(0, vkParamsBuffer);
for (size_t emhtIdx = 0;
emhtIdx < vkExternalMemoryHandleTypeList.size();
emhtIdx++)
{
VulkanExternalMemoryHandleType
vkExternalMemoryHandleType =
vkExternalMemoryHandleTypeList[emhtIdx];
if ((true == disableNTHandleType)
&& (VULKAN_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_NT
== vkExternalMemoryHandleType))
{
// Skip running for WIN32 NT handle.
continue;
}
log_info("External memory handle type: %d \n",
vkExternalMemoryHandleType);
VulkanImageTiling vulkanImageTiling =
vkClExternalMemoryHandleTilingAssumption(
deviceId,
vkExternalMemoryHandleTypeList[emhtIdx], &err);
ASSERT_SUCCESS(err,
"Failed to query OpenCL tiling mode");
VulkanImage2D vkDummyImage2D(
vkDevice, vkFormatList[0], widthList[0],
heightList[0], vulkanImageTiling, 1,
vkExternalMemoryHandleType);
const VulkanMemoryTypeList &memoryTypeList =
vkDummyImage2D.getMemoryTypeList();
for (size_t mtIdx = 0; mtIdx < memoryTypeList.size();
mtIdx++)
{
const VulkanMemoryType &memoryType =
memoryTypeList[mtIdx];
log_info("Memory type index: %d\n",
(uint32_t)memoryType);
log_info("Memory type property: %d\n",
memoryType.getMemoryTypeProperty());
if (!useDeviceLocal)
{
if (VULKAN_MEMORY_TYPE_PROPERTY_DEVICE_LOCAL
== memoryType.getMemoryTypeProperty())
{
continue;
}
}
size_t totalImageMemSize = 0;
uint64_t interImageOffset = 0;
{
VulkanImage2D vkImage2D(
vkDevice, vkFormat, width, height,
vulkanImageTiling, numMipLevels,
vkExternalMemoryHandleType);
ASSERT_LEQ(vkImage2D.getSize(), maxImage2DSize);
totalImageMemSize =
ROUND_UP(vkImage2D.getSize(),
vkImage2D.getAlignment());
}
VulkanImage2DList vkImage2DList(
num2DImages, vkDevice, vkFormat, width, height,
vulkanImageTiling, numMipLevels,
vkExternalMemoryHandleType);
for (size_t bIdx = 0; bIdx < num2DImages; bIdx++)
{
vkImage2DListDeviceMemory1.push_back(
new VulkanDeviceMemory(
vkDevice, vkImage2DList[bIdx],
memoryType,
vkExternalMemoryHandleType));
vkImage2DListDeviceMemory1[bIdx]->bindImage(
vkImage2DList[bIdx], 0);
externalMemory1.push_back(
new clExternalMemoryImage(
*vkImage2DListDeviceMemory1[bIdx],
vkExternalMemoryHandleType, context,
totalImageMemSize, width, height, 0,
vkImage2DList[bIdx], deviceId));
}
VulkanImageViewList vkImage2DViewList(
vkDevice, vkImage2DList);
VulkanImage2DList vkImage2DList2(
num2DImages, vkDevice, vkFormat, width, height,
vulkanImageTiling, numMipLevels,
vkExternalMemoryHandleType);
for (size_t bIdx = 0; bIdx < num2DImages; bIdx++)
{
vkImage2DListDeviceMemory2.push_back(
new VulkanDeviceMemory(
vkDevice, vkImage2DList2[bIdx],
memoryType,
vkExternalMemoryHandleType));
vkImage2DListDeviceMemory2[bIdx]->bindImage(
vkImage2DList2[bIdx], 0);
externalMemory2.push_back(
new clExternalMemoryImage(
*vkImage2DListDeviceMemory2[bIdx],
vkExternalMemoryHandleType, context,
totalImageMemSize, width, height, 0,
vkImage2DList2[bIdx], deviceId));
}
cl_mem external_mem_image1[5];
cl_mem external_mem_image2[5];
for (int i = 0; i < num2DImages; i++)
{
external_mem_image1[i] =
externalMemory1[i]
->getExternalMemoryImage();
external_mem_image2[i] =
externalMemory2[i]
->getExternalMemoryImage();
}
err = clCl2VkExternalSemaphore->signal(cmd_queue1);
test_error_and_cleanup(
err, CLEANUP,
"Failed to signal CL semaphore\n");
if (!useSingleImageKernel)
{
vkDescriptorSet.updateArray(1,
vkImage2DViewList);
vkCopyCommandBuffer.begin();
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_UNDEFINED,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
for (size_t i2DIdx = 0;
i2DIdx < vkImage2DList.size(); i2DIdx++)
{
vkCopyCommandBuffer.copyBufferToImage(
vkSrcBuffer, vkImage2DList[i2DIdx],
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
}
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
VULKAN_IMAGE_LAYOUT_GENERAL);
vkCopyCommandBuffer.end();
memset(dstBufferPtr, 0, srcBufSize);
vkQueue.submit(vkCopyCommandBuffer);
vkShaderCommandBuffer.begin();
vkShaderCommandBuffer.bindPipeline(
vkComputePipeline);
vkShaderCommandBuffer.bindDescriptorSets(
vkComputePipeline, vkPipelineLayout,
vkDescriptorSet);
vkShaderCommandBuffer.dispatch(
NUM_BLOCKS(width, NUM_THREADS_PER_GROUP_X),
NUM_BLOCKS(height,
NUM_THREADS_PER_GROUP_Y / 2),
1);
vkShaderCommandBuffer.end();
}
for (uint32_t iter = 0; iter < innerIterations;
iter++)
{
if (useSingleImageKernel)
{
for (size_t i2DIdx = 0;
i2DIdx < vkImage2DList.size();
i2DIdx++)
{
vkDescriptorSet.update(
1, vkImage2DViewList[i2DIdx]);
vkCopyCommandBuffer.begin();
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_UNDEFINED,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
vkCopyCommandBuffer.copyBufferToImage(
vkSrcBuffer, vkImage2DList[i2DIdx],
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
VULKAN_IMAGE_LAYOUT_GENERAL);
vkCopyCommandBuffer.end();
memset(dstBufferPtr, 0, srcBufSize);
vkQueue.submit(vkCopyCommandBuffer);
vkShaderCommandBuffer.begin();
vkShaderCommandBuffer.bindPipeline(
vkComputePipeline);
vkShaderCommandBuffer
.bindDescriptorSets(
vkComputePipeline,
vkPipelineLayout,
vkDescriptorSet);
vkShaderCommandBuffer.dispatch(
NUM_BLOCKS(width,
NUM_THREADS_PER_GROUP_X),
NUM_BLOCKS(height,
NUM_THREADS_PER_GROUP_Y
/ 2),
1);
vkShaderCommandBuffer.end();
if (i2DIdx < vkImage2DList.size() - 1)
{
vkQueue.submit(
vkShaderCommandBuffer);
}
}
}
vkQueue.submit(vkCl2VkSemaphore,
vkShaderCommandBuffer,
vkVk2CLSemaphore);
err =
clVk2CLExternalSemaphore->wait(cmd_queue1);
if (err != CL_SUCCESS)
{
print_error(err,
"Error: failed to wait on CL "
"external semaphore\n");
goto CLEANUP;
}
switch (num2DImages)
{
case 2:
updateKernelCQ1 = getKernelType(
vkFormat, kernel_float[0],
kernel_signed[0],
kernel_unsigned[0]);
break;
case 3:
updateKernelCQ1 = getKernelType(
vkFormat, kernel_float[1],
kernel_signed[1],
kernel_unsigned[1]);
break;
case 5:
updateKernelCQ1 = getKernelType(
vkFormat, kernel_float[2],
kernel_signed[2],
kernel_unsigned[2]);
break;
}
updateKernelCQ2 = getKernelType(
vkFormat, kernel_float[3], kernel_signed[3],
kernel_unsigned[3]);
// similar kernel-type based on vkFormat
int j = 0;
// Setting arguments of updateKernelCQ2
err = clSetKernelArg(updateKernelCQ2, 0,
sizeof(cl_mem),
&external_mem_image1[0]);
err |= clSetKernelArg(updateKernelCQ2, 1,
sizeof(cl_mem),
&external_mem_image2[0]);
err |= clSetKernelArg(
updateKernelCQ2, 2, sizeof(cl_mem),
&external_mem_image1[num2DImages - 1]);
err |= clSetKernelArg(
updateKernelCQ2, 3, sizeof(cl_mem),
&external_mem_image2[num2DImages - 1]);
err |= clSetKernelArg(updateKernelCQ2, 4,
sizeof(unsigned int),
&num2DImages);
err |= clSetKernelArg(updateKernelCQ2, 5,
sizeof(unsigned int),
&width);
err |= clSetKernelArg(updateKernelCQ2, 6,
sizeof(unsigned int),
&height);
err |= clSetKernelArg(updateKernelCQ2, 7,
sizeof(unsigned int),
&numMipLevels);
for (int i = 0; i < num2DImages - 1; i++, ++j)
{
err = clSetKernelArg(
updateKernelCQ1, j, sizeof(cl_mem),
&external_mem_image1[i]);
err |= clSetKernelArg(
updateKernelCQ1, ++j, sizeof(cl_mem),
&external_mem_image2[i]);
}
err |= clSetKernelArg(updateKernelCQ1, j,
sizeof(unsigned int),
&num2DImages);
err |= clSetKernelArg(updateKernelCQ1, ++j,
sizeof(unsigned int),
&width);
err |= clSetKernelArg(updateKernelCQ1, ++j,
sizeof(unsigned int),
&height);
err |= clSetKernelArg(updateKernelCQ1, ++j,
sizeof(unsigned int),
&numMipLevels);
test_error_and_cleanup(
err, CLEANUP,
"Error: Failed to set arg values \n");
// clVk2CLExternalSemaphore->wait(cmd_queue1);
size_t global_work_size[3] = { width, height,
1 };
cl_event first_launch;
err = clEnqueueNDRangeKernel(
cmd_queue1, updateKernelCQ1, 2, NULL,
global_work_size, NULL, 0, NULL,
&first_launch);
test_error_and_cleanup(
err, CLEANUP,
"Failed to enqueue updateKernelCQ1\n");
err = clEnqueueNDRangeKernel(
cmd_queue2, updateKernelCQ2, 2, NULL,
global_work_size, NULL, 1, &first_launch,
NULL);
test_error_and_cleanup(
err, CLEANUP,
"Failed to enqueue updateKernelCQ2\n");
clFinish(cmd_queue2);
err = clCl2VkExternalSemaphore->signal(
cmd_queue2);
test_error_and_cleanup(
err, CLEANUP,
"Failed to signal CL semaphore\n");
}
unsigned int flags = 0;
size_t mipmapLevelOffset = 0;
cl_event eventReadImage = NULL;
clFinish(cmd_queue2);
for (int i = 0; i < num2DImages; i++)
{
err = clEnqueueReadImage(
cmd_queue1, external_mem_image2[i], CL_TRUE,
origin, region, 0, 0, dstBufferPtr, 0, NULL,
NULL);
test_error_and_cleanup(
err, CLEANUP,
"clEnqueueReadImage failed with"
"error\n");
if (memcmp(srcBufferPtr, dstBufferPtr,
srcBufSize))
{
log_info("Source and destination buffers "
"don't match\n");
if (debug_trace)
{
log_info("Source buffer contents: \n");
for (uint64_t sIdx = 0;
sIdx < srcBufSize; sIdx++)
{
log_info(
"%d ",
(int)vkSrcBufferDeviceMemoryPtr
[sIdx]);
}
log_info("Destination buffer contents:"
"\n");
for (uint64_t dIdx = 0;
dIdx < srcBufSize; dIdx++)
{
log_info("%d ",
(int)dstBufferPtr[dIdx]);
}
}
err = -1;
break;
}
}
for (int i = 0; i < num2DImages; i++)
{
delete vkImage2DListDeviceMemory1[i];
delete vkImage2DListDeviceMemory2[i];
delete externalMemory1[i];
delete externalMemory2[i];
}
vkImage2DListDeviceMemory1.erase(
vkImage2DListDeviceMemory1.begin(),
vkImage2DListDeviceMemory1.begin()
+ num2DImages);
vkImage2DListDeviceMemory2.erase(
vkImage2DListDeviceMemory2.begin(),
vkImage2DListDeviceMemory2.begin()
+ num2DImages);
externalMemory1.erase(externalMemory1.begin(),
externalMemory1.begin()
+ num2DImages);
externalMemory2.erase(externalMemory2.begin(),
externalMemory2.begin()
+ num2DImages);
test_error_and_cleanup(err, CLEANUP,
"Test error detected\n");
}
}
}
}
}
vkImage2DShader.clear();
}
CLEANUP:
if (clVk2CLExternalSemaphore) delete clVk2CLExternalSemaphore;
if (clCl2VkExternalSemaphore) delete clCl2VkExternalSemaphore;
if (srcBufferPtr) free(srcBufferPtr);
if (dstBufferPtr) free(dstBufferPtr);
return err;
}
int run_test_with_one_queue(
cl_context &context, cl_command_queue &cmd_queue1,
cl_kernel *kernel_unsigned, cl_kernel *kernel_signed,
cl_kernel *kernel_float, VulkanDevice &vkDevice,
VulkanExternalSemaphoreHandleType vkExternalSemaphoreHandleType)
{
cl_int err = CL_SUCCESS;
size_t origin[3] = { 0, 0, 0 };
size_t region[3] = { 1, 1, 1 };
cl_kernel updateKernelCQ1;
std::vector<VulkanFormat> vkFormatList = getSupportedVulkanFormatList();
const std::vector<VulkanExternalMemoryHandleType>
vkExternalMemoryHandleTypeList =
getSupportedVulkanExternalMemoryHandleTypeList();
char magicValue = 0;
VulkanBuffer vkParamsBuffer(vkDevice, sizeof(Params));
VulkanDeviceMemory vkParamsDeviceMemory(
vkDevice, vkParamsBuffer.getSize(),
getVulkanMemoryType(vkDevice,
VULKAN_MEMORY_TYPE_PROPERTY_HOST_VISIBLE_COHERENT));
vkParamsDeviceMemory.bindBuffer(vkParamsBuffer);
uint64_t maxImage2DSize =
max_width * max_height * MAX_2D_IMAGE_ELEMENT_SIZE * 2;
VulkanBuffer vkSrcBuffer(vkDevice, maxImage2DSize);
VulkanDeviceMemory vkSrcBufferDeviceMemory(
vkDevice, vkSrcBuffer.getSize(),
getVulkanMemoryType(vkDevice,
VULKAN_MEMORY_TYPE_PROPERTY_HOST_VISIBLE_COHERENT));
vkSrcBufferDeviceMemory.bindBuffer(vkSrcBuffer);
char *srcBufferPtr, *dstBufferPtr;
srcBufferPtr = (char *)malloc(maxImage2DSize);
dstBufferPtr = (char *)malloc(maxImage2DSize);
VulkanDescriptorSetLayoutBindingList vkDescriptorSetLayoutBindingList;
vkDescriptorSetLayoutBindingList.addBinding(
0, VULKAN_DESCRIPTOR_TYPE_STORAGE_BUFFER, 1);
vkDescriptorSetLayoutBindingList.addBinding(
1, VULKAN_DESCRIPTOR_TYPE_STORAGE_IMAGE, MAX_2D_IMAGE_DESCRIPTORS);
VulkanDescriptorSetLayout vkDescriptorSetLayout(
vkDevice, vkDescriptorSetLayoutBindingList);
VulkanPipelineLayout vkPipelineLayout(vkDevice, vkDescriptorSetLayout);
VulkanDescriptorPool vkDescriptorPool(vkDevice,
vkDescriptorSetLayoutBindingList);
VulkanDescriptorSet vkDescriptorSet(vkDevice, vkDescriptorPool,
vkDescriptorSetLayout);
VulkanCommandPool vkCommandPool(vkDevice);
VulkanCommandBuffer vkCopyCommandBuffer(vkDevice, vkCommandPool);
VulkanCommandBuffer vkShaderCommandBuffer(vkDevice, vkCommandPool);
VulkanQueue &vkQueue = vkDevice.getQueue();
VulkanSemaphore vkVk2CLSemaphore(vkDevice, vkExternalSemaphoreHandleType);
VulkanSemaphore vkCl2VkSemaphore(vkDevice, vkExternalSemaphoreHandleType);
clExternalSemaphore *clVk2CLExternalSemaphore = NULL;
clExternalSemaphore *clCl2VkExternalSemaphore = NULL;
clVk2CLExternalSemaphore = new clExternalImportableSemaphore(
vkVk2CLSemaphore, context, vkExternalSemaphoreHandleType, deviceId);
clCl2VkExternalSemaphore = new clExternalExportableSemaphore(
vkCl2VkSemaphore, context, vkExternalSemaphoreHandleType, deviceId);
std::vector<VulkanDeviceMemory *> vkImage2DListDeviceMemory1;
std::vector<VulkanDeviceMemory *> vkImage2DListDeviceMemory2;
std::vector<clExternalMemoryImage *> externalMemory1;
std::vector<clExternalMemoryImage *> externalMemory2;
std::vector<char> vkImage2DShader;
for (size_t fIdx = 0; fIdx < vkFormatList.size(); fIdx++)
{
VulkanFormat vkFormat = vkFormatList[fIdx];
log_info("Format: %d\n", vkFormat);
uint32_t elementSize = getVulkanFormatElementSize(vkFormat);
ASSERT_LEQ(elementSize, (uint32_t)MAX_2D_IMAGE_ELEMENT_SIZE);
log_info("elementSize= %d\n", elementSize);
std::string fileName = "image2D_"
+ std::string(getVulkanFormatGLSLFormat(vkFormat)) + ".spv";
log_info("Load %s file", fileName.c_str());
vkImage2DShader = readFile(fileName);
VulkanShaderModule vkImage2DShaderModule(vkDevice, vkImage2DShader);
VulkanComputePipeline vkComputePipeline(vkDevice, vkPipelineLayout,
vkImage2DShaderModule);
for (size_t wIdx = 0; wIdx < ARRAY_SIZE(widthList); wIdx++)
{
uint32_t width = widthList[wIdx];
log_info("Width: %d\n", width);
if (width > max_width) continue;
region[0] = width;
for (size_t hIdx = 0; hIdx < ARRAY_SIZE(heightList); hIdx++)
{
uint32_t height = heightList[hIdx];
log_info("Height: %d\n", height);
if (height > max_height) continue;
region[1] = height;
uint32_t numMipLevels = 1;
log_info("Number of mipmap levels: %d\n", numMipLevels);
magicValue++;
char *vkSrcBufferDeviceMemoryPtr =
(char *)vkSrcBufferDeviceMemory.map();
uint64_t srcBufSize = 0;
memset(vkSrcBufferDeviceMemoryPtr, 0, maxImage2DSize);
memset(srcBufferPtr, 0, maxImage2DSize);
uint32_t mipLevel = 0;
for (uint32_t row = 0;
row < std::max(height >> mipLevel, uint32_t(1)); row++)
{
for (uint32_t col = 0;
col < std::max(width >> mipLevel, uint32_t(1)); col++)
{
for (uint32_t elementByte = 0;
elementByte < elementSize; elementByte++)
{
vkSrcBufferDeviceMemoryPtr[srcBufSize] =
(char)(magicValue + mipLevel + row + col);
srcBufferPtr[srcBufSize] =
(char)(magicValue + mipLevel + row + col);
srcBufSize++;
}
}
}
srcBufSize = ROUND_UP(
srcBufSize,
std::max(
elementSize,
(uint32_t)VULKAN_MIN_BUFFER_OFFSET_COPY_ALIGNMENT));
vkSrcBufferDeviceMemory.unmap();
for (size_t niIdx = 0; niIdx < ARRAY_SIZE(num2DImagesList);
niIdx++)
{
uint32_t num2DImages = num2DImagesList[niIdx];
log_info("Number of images: %d\n", num2DImages);
ASSERT_LEQ(num2DImages, (uint32_t)MAX_2D_IMAGES);
Params *params = (Params *)vkParamsDeviceMemory.map();
uint32_t num_2D_image;
if (useSingleImageKernel)
{
num_2D_image = 1;
}
else
{
num_2D_image = num2DImages;
}
params->numImage2DDescriptors = num_2D_image * numMipLevels;
vkParamsDeviceMemory.unmap();
vkDescriptorSet.update(0, vkParamsBuffer);
for (size_t emhtIdx = 0;
emhtIdx < vkExternalMemoryHandleTypeList.size();
emhtIdx++)
{
VulkanExternalMemoryHandleType
vkExternalMemoryHandleType =
vkExternalMemoryHandleTypeList[emhtIdx];
log_info("External memory handle type: %d \n",
vkExternalMemoryHandleType);
if ((true == disableNTHandleType)
&& (VULKAN_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_NT
== vkExternalMemoryHandleType))
{
// Skip running for WIN32 NT handle.
continue;
}
VulkanImageTiling vulkanImageTiling =
vkClExternalMemoryHandleTilingAssumption(
deviceId,
vkExternalMemoryHandleTypeList[emhtIdx], &err);
test_error_and_cleanup(
err, CLEANUP, "Failed to query OpenCL tiling mode");
VulkanImage2D vkDummyImage2D(
vkDevice, vkFormatList[0], widthList[0],
heightList[0], vulkanImageTiling, 1,
vkExternalMemoryHandleType);
const VulkanMemoryTypeList &memoryTypeList =
vkDummyImage2D.getMemoryTypeList();
for (size_t mtIdx = 0; mtIdx < memoryTypeList.size();
mtIdx++)
{
const VulkanMemoryType &memoryType =
memoryTypeList[mtIdx];
log_info("Memory type index: %d\n",
(uint32_t)memoryType);
log_info("Memory type property: %d\n",
memoryType.getMemoryTypeProperty());
if (!useDeviceLocal)
{
if (VULKAN_MEMORY_TYPE_PROPERTY_DEVICE_LOCAL
== memoryType.getMemoryTypeProperty())
{
continue;
}
}
size_t totalImageMemSize = 0;
uint64_t interImageOffset = 0;
{
VulkanImage2D vkImage2D(
vkDevice, vkFormat, width, height,
vulkanImageTiling, numMipLevels,
vkExternalMemoryHandleType);
ASSERT_LEQ(vkImage2D.getSize(), maxImage2DSize);
totalImageMemSize =
ROUND_UP(vkImage2D.getSize(),
vkImage2D.getAlignment());
}
VulkanImage2DList vkImage2DList(
num2DImages, vkDevice, vkFormat, width, height,
vulkanImageTiling, numMipLevels,
vkExternalMemoryHandleType);
for (size_t bIdx = 0; bIdx < vkImage2DList.size();
bIdx++)
{
// Create list of Vulkan device memories and
// bind the list of Vulkan images.
vkImage2DListDeviceMemory1.push_back(
new VulkanDeviceMemory(
vkDevice, vkImage2DList[bIdx],
memoryType,
vkExternalMemoryHandleType));
vkImage2DListDeviceMemory1[bIdx]->bindImage(
vkImage2DList[bIdx], 0);
externalMemory1.push_back(
new clExternalMemoryImage(
*vkImage2DListDeviceMemory1[bIdx],
vkExternalMemoryHandleType, context,
totalImageMemSize, width, height, 0,
vkImage2DList[bIdx], deviceId));
}
VulkanImageViewList vkImage2DViewList(
vkDevice, vkImage2DList);
VulkanImage2DList vkImage2DList2(
num2DImages, vkDevice, vkFormat, width, height,
vulkanImageTiling, numMipLevels,
vkExternalMemoryHandleType);
for (size_t bIdx = 0; bIdx < vkImage2DList2.size();
bIdx++)
{
vkImage2DListDeviceMemory2.push_back(
new VulkanDeviceMemory(
vkDevice, vkImage2DList2[bIdx],
memoryType,
vkExternalMemoryHandleType));
vkImage2DListDeviceMemory2[bIdx]->bindImage(
vkImage2DList2[bIdx], 0);
externalMemory2.push_back(
new clExternalMemoryImage(
*vkImage2DListDeviceMemory2[bIdx],
vkExternalMemoryHandleType, context,
totalImageMemSize, width, height, 0,
vkImage2DList2[bIdx], deviceId));
}
cl_mem external_mem_image1[4];
cl_mem external_mem_image2[4];
for (int i = 0; i < num2DImages; i++)
{
external_mem_image1[i] =
externalMemory1[i]
->getExternalMemoryImage();
external_mem_image2[i] =
externalMemory2[i]
->getExternalMemoryImage();
}
err = clCl2VkExternalSemaphore->signal(cmd_queue1);
test_error_and_cleanup(
err, CLEANUP,
"Failed to signal CL semaphore\n");
if (!useSingleImageKernel)
{
vkDescriptorSet.updateArray(1,
vkImage2DViewList);
vkCopyCommandBuffer.begin();
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_UNDEFINED,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
for (size_t i2DIdx = 0;
i2DIdx < vkImage2DList.size(); i2DIdx++)
{
vkCopyCommandBuffer.copyBufferToImage(
vkSrcBuffer, vkImage2DList[i2DIdx],
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
}
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
VULKAN_IMAGE_LAYOUT_GENERAL);
vkCopyCommandBuffer.end();
memset(dstBufferPtr, 0, srcBufSize);
vkQueue.submit(vkCopyCommandBuffer);
vkShaderCommandBuffer.begin();
vkShaderCommandBuffer.bindPipeline(
vkComputePipeline);
vkShaderCommandBuffer.bindDescriptorSets(
vkComputePipeline, vkPipelineLayout,
vkDescriptorSet);
vkShaderCommandBuffer.dispatch(
NUM_BLOCKS(width, NUM_THREADS_PER_GROUP_X),
NUM_BLOCKS(height,
NUM_THREADS_PER_GROUP_Y / 2),
1);
vkShaderCommandBuffer.end();
}
for (uint32_t iter = 0; iter < innerIterations;
iter++)
{
if (useSingleImageKernel)
{
for (size_t i2DIdx = 0;
i2DIdx < vkImage2DList.size();
i2DIdx++)
{
vkDescriptorSet.update(
1, vkImage2DViewList[i2DIdx]);
vkCopyCommandBuffer.begin();
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_UNDEFINED,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
vkCopyCommandBuffer.copyBufferToImage(
vkSrcBuffer, vkImage2DList[i2DIdx],
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
vkCopyCommandBuffer.pipelineBarrier(
vkImage2DList,
VULKAN_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
VULKAN_IMAGE_LAYOUT_GENERAL);
vkCopyCommandBuffer.end();
memset(dstBufferPtr, 0, srcBufSize);
vkQueue.submit(vkCopyCommandBuffer);
vkShaderCommandBuffer.begin();
vkShaderCommandBuffer.bindPipeline(
vkComputePipeline);
vkShaderCommandBuffer
.bindDescriptorSets(
vkComputePipeline,
vkPipelineLayout,
vkDescriptorSet);
vkShaderCommandBuffer.dispatch(
NUM_BLOCKS(width,
NUM_THREADS_PER_GROUP_X),
NUM_BLOCKS(height,
NUM_THREADS_PER_GROUP_Y
/ 2),
1);
vkShaderCommandBuffer.end();
if (i2DIdx < vkImage2DList.size() - 1)
{
vkQueue.submit(
vkShaderCommandBuffer);
}
}
}
vkQueue.submit(vkCl2VkSemaphore,
vkShaderCommandBuffer,
vkVk2CLSemaphore);
err =
clVk2CLExternalSemaphore->wait(cmd_queue1);
test_error_and_cleanup(
err, CLEANUP,
"Error: failed to wait on CL external "
"semaphore\n");
switch (num2DImages)
{
case 1:
updateKernelCQ1 = getKernelType(
vkFormat, kernel_float[0],
kernel_signed[0],
kernel_unsigned[0]);
break;
case 2:
updateKernelCQ1 = getKernelType(
vkFormat, kernel_float[1],
kernel_signed[1],
kernel_unsigned[1]);
break;
case 4:
updateKernelCQ1 = getKernelType(
vkFormat, kernel_float[2],
kernel_signed[2],
kernel_unsigned[2]);
break;
}
int j = 0;
for (int i = 0; i < num2DImages; i++, ++j)
{
err = clSetKernelArg(
updateKernelCQ1, j, sizeof(cl_mem),
&external_mem_image1[i]);
err |= clSetKernelArg(
updateKernelCQ1, ++j, sizeof(cl_mem),
&external_mem_image2[i]);
}
err |= clSetKernelArg(updateKernelCQ1, j,
sizeof(unsigned int),
&num2DImages);
err |= clSetKernelArg(updateKernelCQ1, ++j,
sizeof(unsigned int),
&width);
err |= clSetKernelArg(updateKernelCQ1, ++j,
sizeof(unsigned int),
&height);
err |= clSetKernelArg(updateKernelCQ1, ++j,
sizeof(unsigned int),
&numMipLevels);
test_error_and_cleanup(
err, CLEANUP,
"Error: Failed to set arg "
"values for kernel-1\n");
size_t global_work_size[3] = { width, height,
1 };
err = clEnqueueNDRangeKernel(
cmd_queue1, updateKernelCQ1, 2, NULL,
global_work_size, NULL, 0, NULL, NULL);
test_error_and_cleanup(
err, CLEANUP,
"Failed to enqueue updateKernelCQ1\n");
err = clCl2VkExternalSemaphore->signal(
cmd_queue1);
test_error_and_cleanup(
err, CLEANUP,
"Failed to signal CL semaphore\n");
}
unsigned int flags = 0;
size_t mipmapLevelOffset = 0;
cl_event eventReadImage = NULL;
for (int i = 0; i < num2DImages; i++)
{
err = clEnqueueReadImage(
cmd_queue1, external_mem_image2[i], CL_TRUE,
origin, region, 0, 0, dstBufferPtr, 0, NULL,
NULL);
test_error_and_cleanup(
err, CLEANUP,
"clEnqueueReadImage failed with"
"error\n");
if (memcmp(srcBufferPtr, dstBufferPtr,
srcBufSize))
{
log_info("Source and destination buffers "
"don't match\n");
if (debug_trace)
{
log_info("Source buffer contents: \n");
for (uint64_t sIdx = 0;
sIdx < srcBufSize; sIdx++)
{
log_info(
"%d",
(int)vkSrcBufferDeviceMemoryPtr
[sIdx]);
}
log_info(
"Destination buffer contents:");
for (uint64_t dIdx = 0;
dIdx < srcBufSize; dIdx++)
{
log_info("%d",
(int)dstBufferPtr[dIdx]);
}
}
err = -1;
break;
}
}
for (int i = 0; i < num2DImages; i++)
{
delete vkImage2DListDeviceMemory1[i];
delete vkImage2DListDeviceMemory2[i];
delete externalMemory1[i];
delete externalMemory2[i];
}
vkImage2DListDeviceMemory1.erase(
vkImage2DListDeviceMemory1.begin(),
vkImage2DListDeviceMemory1.begin()
+ num2DImages);
vkImage2DListDeviceMemory2.erase(
vkImage2DListDeviceMemory2.begin(),
vkImage2DListDeviceMemory2.begin()
+ num2DImages);
externalMemory1.erase(externalMemory1.begin(),
externalMemory1.begin()
+ num2DImages);
externalMemory2.erase(externalMemory2.begin(),
externalMemory2.begin()
+ num2DImages);
test_error_and_cleanup(err, CLEANUP,
"Test detected error\n");
}
}
}
}
}
vkImage2DShader.clear();
}
CLEANUP:
if (clVk2CLExternalSemaphore) delete clVk2CLExternalSemaphore;
if (clCl2VkExternalSemaphore) delete clCl2VkExternalSemaphore;
if (srcBufferPtr) free(srcBufferPtr);
if (dstBufferPtr) free(dstBufferPtr);
return err;
}
int test_image_common(cl_device_id device_, cl_context context_,
cl_command_queue queue_, int numElements_)
{
int current_device = 0;
int device_count = 0;
int devices_prohibited = 0;
cl_int err = CL_SUCCESS;
cl_platform_id platform = NULL;
size_t extensionSize = 0;
cl_uint num_devices = 0;
cl_uint device_no = 0;
cl_device_id *devices;
char *extensions = NULL;
const char *program_source_const;
cl_command_queue cmd_queue1 = NULL;
cl_command_queue cmd_queue2 = NULL;
cl_context context = NULL;
const uint32_t num_kernels = ARRAY_SIZE(num2DImagesList) + 1;
// One kernel for Cross-CQ case
const uint32_t num_kernel_types = 3;
const char *kernel_source[num_kernels] = { kernel_text_numImage_1,
kernel_text_numImage_2,
kernel_text_numImage_4 };
char source_1[4096];
char source_2[4096];
char source_3[4096];
size_t program_source_length;
cl_program program[num_kernel_types] = { NULL };
cl_kernel kernel_float[num_kernels] = { NULL };
cl_kernel kernel_signed[num_kernels] = { NULL };
cl_kernel kernel_unsigned[num_kernels] = { NULL };
cl_mem external_mem_image1;
cl_mem external_mem_image2;
std::vector<VulkanExternalSemaphoreHandleType> supportedSemaphoreTypes;
VulkanDevice vkDevice;
cl_context_properties contextProperties[] = { CL_CONTEXT_PLATFORM, 0, 0 };
// get the platform ID
err = clGetPlatformIDs(1, &platform, NULL);
test_error_and_cleanup(err, CLEANUP, "Error: Failed to get platform\n");
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices);
test_error_and_cleanup(
err, CLEANUP, "clGetDeviceIDs failed in returning no. of devices\n");
devices = (cl_device_id *)malloc(num_devices * sizeof(cl_device_id));
if (NULL == devices)
{
test_fail_and_cleanup(err, CLEANUP,
"Unable to allocate memory for devices\n");
}
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, num_devices, devices,
NULL);
test_error_and_cleanup(err, CLEANUP, "Failed to get deviceID.\n");
contextProperties[1] = (cl_context_properties)platform;
log_info("Assigned contextproperties for platform\n");
for (device_no = 0; device_no < num_devices; device_no++)
{
err = clGetDeviceInfo(devices[device_no], CL_DEVICE_EXTENSIONS, 0, NULL,
&extensionSize);
if (CL_SUCCESS != err)
{
print_error(
err,
"Error in clGetDeviceInfo for getting device_extension size\n");
goto CLEANUP;
}
extensions = (char *)malloc(extensionSize);
if (NULL == extensions)
{
err = CL_OUT_OF_HOST_MEMORY;
print_error(err, "Unable to allocate memory for extensions\n");
goto CLEANUP;
}
err = clGetDeviceInfo(devices[device_no], CL_DEVICE_EXTENSIONS,
extensionSize, extensions, NULL);
if (CL_SUCCESS != err)
{
print_error(
err, "Error in clGetDeviceInfo for getting device_extension\n");
goto CLEANUP;
}
err = clGetDeviceInfo(devices[device_no], CL_DEVICE_UUID_KHR,
CL_UUID_SIZE_KHR, uuid, NULL);
test_error_and_cleanup(err, CLEANUP,
"clGetDeviceInfo failed with error");
supportedSemaphoreTypes =
getSupportedInteropExternalSemaphoreHandleTypes(devices[device_no],
vkDevice);
// If device does not support any semaphores, try the next one
if (supportedSemaphoreTypes.empty())
{
continue;
}
err =
memcmp(uuid, vkDevice.getPhysicalDevice().getUUID(), VK_UUID_SIZE);
if (err == 0)
{
break;
}
}
if (supportedSemaphoreTypes.empty())
{
test_fail_and_cleanup(
err, CLEANUP, "No devices found that support OpenCL semaphores\n");
}
if (device_no >= num_devices)
{
test_fail_and_cleanup(err, CLEANUP,
"OpenCL error:"
"No Vulkan-OpenCL Interop capable GPU found.\n");
}
deviceId = devices[device_no];
err = setMaxImageDimensions(deviceId, max_width, max_height);
test_error_and_cleanup(err, CLEANUP, "error setting max image dimensions");
log_info("Set max_width to %lu and max_height to %lu\n", max_width,
max_height);
context = clCreateContextFromType(contextProperties, CL_DEVICE_TYPE_GPU,
NULL, NULL, &err);
test_error_and_cleanup(err, CLEANUP, "error creating context");
log_info("Successfully created context !!!\n");
cmd_queue1 = clCreateCommandQueue(context, devices[device_no], 0, &err);
test_error_and_cleanup(err, CLEANUP,
"Error: Failed to create command queue!\n");
log_info("clCreateCommandQueue successfull \n");
cmd_queue2 = clCreateCommandQueue(context, devices[device_no], 0, &err);
test_error_and_cleanup(err, CLEANUP,
"Error: Failed to create command queue!\n");
log_info("clCreateCommandQueue2 successful \n");
for (int i = 0; i < num_kernels; i++)
{
switch (i)
{
case 0:
sprintf(source_1, kernel_source[i], "float4", "f", "float4",
"f", "f", "f");
sprintf(source_2, kernel_source[i], "int4", "i", "int4", "i",
"i", "i");
sprintf(source_3, kernel_source[i], "uint4", "ui", "uint4",
"ui", "ui", "ui");
break;
case 1:
sprintf(source_1, kernel_source[i], "float4", "f", "float4",
"f", "float4", "f", "float4", "f", "f", "f", "f", "f");
sprintf(source_2, kernel_source[i], "int4", "i", "int4", "i",
"int4", "i", "int4", "i", "i", "i", "i", "i");
sprintf(source_3, kernel_source[i], "uint4", "ui", "uint4",
"ui", "uint4", "ui", "uint4", "ui", "ui", "ui", "ui",
"ui");
break;
case 2:
sprintf(source_1, kernel_source[i], "float4", "f", "float4",
"f", "float4", "f", "float4", "f", "float4", "f",
"float4", "f", "float4", "f", "float4", "f", "f", "f",
"f", "f", "f", "f", "f", "f");
sprintf(source_2, kernel_source[i], "int4", "i", "int4", "i",
"int4", "i", "int4", "i", "int4", "i", "int4", "i",
"int4", "i", "int4", "i", "i", "i", "i", "i", "i", "i",
"i", "i");
sprintf(source_3, kernel_source[i], "uint4", "ui", "uint4",
"ui", "uint4", "ui", "uint4", "ui", "uint4", "ui",
"uint4", "ui", "uint4", "ui", "uint4", "ui", "ui", "ui",
"ui", "ui", "ui", "ui", "ui", "ui");
break;
case 3:
// Addtional case for creating updateKernelCQ2 which takes two
// images
sprintf(source_1, kernel_source[1], "float4", "f", "float4",
"f", "float4", "f", "float4", "f", "f", "f", "f", "f");
sprintf(source_2, kernel_source[1], "int4", "i", "int4", "i",
"int4", "i", "int4", "i", "i", "i", "i", "i");
sprintf(source_3, kernel_source[1], "uint4", "ui", "uint4",
"ui", "uint4", "ui", "uint4", "ui", "ui", "ui", "ui",
"ui");
break;
}
const char *sourceTexts[num_kernel_types] = { source_1, source_2,
source_3 };
for (int k = 0; k < num_kernel_types; k++)
{
program_source_length = strlen(sourceTexts[k]);
program[k] = clCreateProgramWithSource(
context, 1, &sourceTexts[k], &program_source_length, &err);
err |= clBuildProgram(program[k], 0, NULL, NULL, NULL, NULL);
}
test_error_and_cleanup(err, CLEANUP, "Error: Failed to build program");
// create the kernel
kernel_float[i] = clCreateKernel(program[0], "image2DKernel", &err);
test_error_and_cleanup(err, CLEANUP, "clCreateKernel failed");
kernel_signed[i] = clCreateKernel(program[1], "image2DKernel", &err);
test_error_and_cleanup(err, CLEANUP, "clCreateKernel failed");
kernel_unsigned[i] = clCreateKernel(program[2], "image2DKernel", &err);
test_error_and_cleanup(err, CLEANUP, "clCreateKernel failed ");
}
for (VulkanExternalSemaphoreHandleType externalSemaphoreType :
supportedSemaphoreTypes)
{
if (numCQ == 2)
{
err = run_test_with_two_queue(
context, cmd_queue1, cmd_queue2, kernel_unsigned, kernel_signed,
kernel_float, vkDevice, externalSemaphoreType);
}
else
{
err = run_test_with_one_queue(context, cmd_queue1, kernel_unsigned,
kernel_signed, kernel_float, vkDevice,
externalSemaphoreType);
}
}
CLEANUP:
for (int i = 0; i < num_kernels; i++)
{
if (kernel_float[i])
{
clReleaseKernel(kernel_float[i]);
}
if (kernel_unsigned[i])
{
clReleaseKernel(kernel_unsigned[i]);
}
if (kernel_signed[i])
{
clReleaseKernel(kernel_signed[i]);
}
}
for (int i = 0; i < num_kernel_types; i++)
{
if (program[i])
{
clReleaseProgram(program[i]);
}
}
if (cmd_queue1) clReleaseCommandQueue(cmd_queue1);
if (cmd_queue2) clReleaseCommandQueue(cmd_queue2);
if (context) clReleaseContext(context);
if (extensions) free(extensions);
if (devices) free(devices);
return err;
}