[subgroups][non_uniform_broadcast] Fix broadcasting index generation (#1680)

* [subgroups][non_uniform_broadcast] Fix broadcasting index generation

The subgroup size may not be greater than `NR_OF_ACTIVE_WORK_ITEMS`.
Broadcasting index needs to be reduced in that case.

Otherwise, if subgroup size == `NR_OF_ACTIVE_WORK_ITEMS` == 4, then we
will encounter "divide-by-zero" error when evaluating `bcast_index %
(n - NR_OF_ACTIVE_WORK_ITEMS)`.

* Revert "[subgroups][non_uniform_broadcast] Fix broadcasting index generation"

This reverts commit 9bbab539de.

* [subgroups][non_uniform_broadcast] Fix broadcasting index generation

Dynamically activate half of the work items in the current subgroup
instead of hardcoding as `NR_OF_ACTIVE_WORK_ITEMS`.

* Apply suggestion
This commit is contained in:
Yilong Guo
2024-03-13 00:25:06 +08:00
committed by GitHub
parent ee504ba861
commit a045f76eed
3 changed files with 14 additions and 26 deletions

View File

@@ -28,8 +28,6 @@
#include <regex>
#include <map>
#define NR_OF_ACTIVE_WORK_ITEMS 4
extern MTdata gMTdata;
typedef std::bitset<128> bs128;
extern cl_half_rounding_mode g_rounding_mode;
@@ -1474,8 +1472,6 @@ template <typename Ty, typename Fns, size_t TSIZE = 0> struct test
Fns::log_test(test_params, "");
kernel_sstr << "#define NR_OF_ACTIVE_WORK_ITEMS ";
kernel_sstr << NR_OF_ACTIVE_WORK_ITEMS << "\n";
// Make sure a test of type Ty is supported by the device
if (!TypeManager<Ty>::type_supported(device))
{