This makes it literally impossible for drivers to constant fold the
IsTininessDetectedBeforeRounding kernel. Sure, drivers might have should
respect volatile here, but I'm not convinced this is actually required
by the spec in a very strict sense, because here there are no
side-effects possible in the first place.
And as far as I know, constant folding is allowed to give different
results than an actual GPU calculation would.
In any case, passing the constants via kernel arguments makes this
detection more reliable and one doesn't have to wonder why the fma test
is failing.
Side note: this was the last bug (known as of today) I had to fix in
order being able to make a CL CTS submission for Apple Silicon devices.