Drivers _may_ choose to advertise values for
`CL_DEVICE_MAX_WORK_GROUP_SIZE` or `CL_DEVICE_MAX_WORK_ITEM_SIZES` that
kernels without a `reqd_work_group_size` are not able to be launched
with.
The CTS should therefore make sure that the local_size passed to
`clEnqueueNDRangeKernel` does not exceed `CL_KERNEL_WORK_GROUP_SIZE`
This fixes it up in two places I've noticed this not happening.
- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the proper
size
- clamp localThreads[2] as for localThreads[0] and localThreads[2]
- clamp all localThreads elements in regard of CL_MAX_WORK_GROUP_SIZE
- fix the size using to create/read the output buffer
Fix#2238
The buffer size input to the test function clEnqueueReadBuffer was
incorrect, which cause segmentation fault. And it didn't match the size
for the host allocation outptr