- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the proper
size
- clamp localThreads[2] as for localThreads[0] and localThreads[2]
- clamp all localThreads elements in regard of CL_MAX_WORK_GROUP_SIZE
- fix the size using to create/read the output buffer
Fix#2238
The buffer size input to the test function clEnqueueReadBuffer was
incorrect, which cause segmentation fault. And it didn't match the size
for the host allocation outptr