Both InstanceNorm plugin sources convert half scale/bias weights to float with __internal_half2float, which is a private static inline helper inside cuda_fp16.hpp, not part of the public CUDA API. It compiles under nvcc, where that header brings the helper into scope, but fails under other CUDA-compatible compilers (clang-CUDA, and AMD-targeting CUDA toolchains), and it would also stand in the way of a HIP/ROCm port.
Call sites (current main)
plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cu, in the InstanceNormalizationV3Plugin(float, Weights const&, Weights const&, int32_t, float) constructor's copyWeights lambda (the kHALF branch).
plugin/instanceNormalizationPlugin/instanceNormalizationPluginLegacy.cu, in the InstanceNormalizationPlugin(float, Weights const&, Weights const&, int32_t, float) constructor's copyWeights lambda (the kHALF branch).
Both read:
auto const value = static_cast<unsigned short const*>(input.values);
output.push_back(__internal_half2float(value[c]));
Under a non-nvcc CUDA compiler this errors with:
error: use of undeclared identifier '__internal_half2float'
Proposed fix (public API, one line per site)
Wrap the raw bits in a __half_raw and call the public __half2float:
auto const value = static_cast<unsigned short const*>(input.values);
__half_raw raw;
raw.x = value[c];
output.push_back(__half2float(raw));
Verified
Minimal reproducer compiled with NVIDIA nvcc (CUDA 13.1, sm_75) and with a clang-based CUDA compiler:
| code |
nvcc |
non-nvcc CUDA |
original (__internal_half2float) |
compiles |
fails (undeclared identifier) |
fix (__half_raw + __half2float) |
compiles |
compiles |
The public __half2float runs the same conversion the private helper does, so the numeric behaviour is unchanged. The change just drops a dependency on an undocumented nvcc internal and makes the plugin portable to other CUDA compilers, at no cost under nvcc.
Filing this first per CONTRIBUTING (issue before PR). I'm happy to send a PR touching both files, DCO signed off, once this is acknowledged.
Both InstanceNorm plugin sources convert half scale/bias weights to float with
__internal_half2float, which is a privatestatic inlinehelper insidecuda_fp16.hpp, not part of the public CUDA API. It compiles under nvcc, where that header brings the helper into scope, but fails under other CUDA-compatible compilers (clang-CUDA, and AMD-targeting CUDA toolchains), and it would also stand in the way of a HIP/ROCm port.Call sites (current
main)plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cu, in theInstanceNormalizationV3Plugin(float, Weights const&, Weights const&, int32_t, float)constructor'scopyWeightslambda (thekHALFbranch).plugin/instanceNormalizationPlugin/instanceNormalizationPluginLegacy.cu, in theInstanceNormalizationPlugin(float, Weights const&, Weights const&, int32_t, float)constructor'scopyWeightslambda (thekHALFbranch).Both read:
Under a non-nvcc CUDA compiler this errors with:
Proposed fix (public API, one line per site)
Wrap the raw bits in a
__half_rawand call the public__half2float:Verified
Minimal reproducer compiled with NVIDIA nvcc (CUDA 13.1, sm_75) and with a clang-based CUDA compiler:
__internal_half2float)__half_raw+__half2float)The public
__half2floatruns the same conversion the private helper does, so the numeric behaviour is unchanged. The change just drops a dependency on an undocumented nvcc internal and makes the plugin portable to other CUDA compilers, at no cost under nvcc.Filing this first per CONTRIBUTING (issue before PR). I'm happy to send a PR touching both files, DCO signed off, once this is acknowledged.