GPU NMS kernel and refactor of NMS operator#4893
Conversation
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 170 to 187 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 191 to 192 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/verify/test_nms.cpp
Lines 54 to 57 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/verify/test_nms.cpp
Line 201 in 5ca611f
beacuse of operator::flatten same name clashing.
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 378 to 384 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 388 to 397 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 405 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 433 to 449 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 451 to 455 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 463 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 481 to 487 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 491 to 557 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 559 to 575 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 583 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 601 to 607 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 611 to 620 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 622 to 629 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 637 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 655 to 661 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 665 to 798 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 800 to 866 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 874 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in c2ddb73
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 168 to 182 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 379 to 385 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 389 to 398 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 406 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 435 to 451 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in b5c1e77
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 168 to 182 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 379 to 385 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 389 to 398 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 406 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 435 to 451 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in 1011256
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in 32c779d
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4893 +/- ##
===========================================
- Coverage 92.88% 92.67% -0.20%
===========================================
Files 587 588 +1
Lines 30348 30414 +66
===========================================
- Hits 28187 28186 -1
- Misses 2161 2228 +67
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a GPU implementation of ONNX NonMaxSuppression (NMS) and refactors the core nonmaxsuppression op to better represent “dynamic” output sizing in MIGraphX IR by returning a tuple {indices, num_selected} (with optional slicing applied by the ONNX parser).
Changes:
- Introduces a 3-stage GPU NMS pipeline (
gpu::nms_sort→gpu::nms_filter→gpu::nms_compact) with new kernels and JIT compilers, and updates GPU lowering to select the pipeline for static shapes (ref fallback for dynamic inputs). - Refactors
op::nonmaxsuppressionto always return a tuple (raw indices + selected count) and updates ONNX parsing/tests to extract (and optionally slice) the indices output. - Renames shape tuple flattening helper to
flatten_shapesand updates GPU JIT paths to use it; adds/updates documentation and tests (including new GPU NMS tests).
Reviewed changes
Copilot reviewed 33 out of 35 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/verify/test_nms.cpp | Removes legacy verify-based NMS tests. |
| test/ref/nonmaxsuppression.cpp | Updates ref tests to slice indices using the tuple’s num_selected. |
| test/op_shape_test.cpp | Updates NMS shape expectations to a tuple {indices, num_selected}. |
| test/onnx/parse/nms_use_dyn_output_false_test.cpp | Removes test tied to removed use_dyn_output option/behavior. |
| test/onnx/parse/nms_test.cpp | Updates ONNX parse test to extract tuple elem 0 (indices). |
| test/onnx/parse/nms_dynamic_classes_test.cpp | Updates dynamic-classes parse test for tuple-return NMS. |
| test/onnx/parse/nms_dynamic_boxes_test.cpp | Updates dynamic-boxes parse test for tuple-return NMS. |
| test/onnx/parse/nms_dynamic_batch_test.cpp | Updates dynamic-batch parse test for tuple-return NMS. |
| test/onnx/gen_onnx.py | Removes custom use_dyn_output attribute generation in NMS ONNX test models. |
| test/multi_target/multitarget_test.cpp | Updates multitarget test to slice indices using tuple count. |
| test/gpu/nonmaxsuppression.cpp | Adds extensive GPU NMS functional tests (plus a dynamic-shape fallback test). |
| src/targets/gpu/nms_ops.cpp | Adds new internal GPU ops (gpu::nms_sort/filter/compact) and their shapes. |
| src/targets/gpu/lowering.cpp | Lowers nonmaxsuppression to the GPU pipeline for static shapes; ref fallback for dynamic. |
| src/targets/gpu/kernels/include/migraphx/kernels/sort.hpp | Adds block-wide bitonic sort helpers used by NMS kernels. |
| src/targets/gpu/kernels/include/migraphx/kernels/nonmaxsuppression.hpp | Adds GPU kernel implementations for sort/filter/compact stages. |
| src/targets/gpu/jit/topk.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/reduce.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/pointwise.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/nonmaxsuppression.cpp | Adds JIT compilers for gpu::nms_sort/filter/compact kernels. |
| src/targets/gpu/include/migraphx/gpu/compile_hip_code_object.hpp | Clarifies meaning of global/local fields in compile options. |
| src/targets/gpu/device/include/migraphx/gpu/device/scan.hpp | Adds/clarifies block scan documentation/overload. |
| src/targets/gpu/compile_hip_code_object.cpp | Documents compute_block_size intent. |
| src/targets/gpu/code_object_op.cpp | Uses flatten_shapes for shape compatibility checks with tuple inputs. |
| src/targets/gpu/CMakeLists.txt | Adds nms_ops.cpp to GPU target build. |
| src/shape.cpp | Renames shape flattening helper to flatten_shapes. |
| src/onnx/parse_nonmaxsuppression.cpp | Parses NMS as tuple-return op and optionally slices indices under MIGRAPHX_USE_DYNAMIC_NMS. |
| src/onnx/onnx.cpp | Removes propagation of removed use_dyn_output option into parser. |
| src/onnx/include/migraphx/onnx/onnx_parser.hpp | Removes onnx_parser::use_dyn_output member. |
| src/include/migraphx/shape.hpp | Renames public API flatten → flatten_shapes. |
| src/include/migraphx/op/nonmaxsuppression.hpp | Refactors NMS op to return tuple {indices, num_selected} and adds runtime shape validation. |
| src/include/migraphx/onnx.hpp | Removes public onnx_options::use_dyn_output. |
| docs/reference/MIGraphX-dev-env-vars.rst | Documents MIGRAPHX_USE_DYNAMIC_NMS env var behavior. |
| docs/dev/onnx_operators.rst | Updates operator matrix entry for NonMaxSuppression. |
| const auto iou_packed = num_boxes * (num_boxes - 1) / 2; | ||
|
|
| // num_batches/num_classes/num_boxes are kept as op attributes because the filter inputs | ||
| // is a scratch buffer from which these can't be recovered. | ||
| // inputs = {sorted_boxes, sorted_scores, sorted_box_indices, output_indices, output_bc_counts} |
| const auto& boxes_s = inputs[0]; | ||
| const auto& scores_s = inputs[1]; | ||
| const auto num_batches = boxes_s.lens()[0]; | ||
| const auto num_boxes = boxes_s.lens()[1]; | ||
| const auto num_classes = scores_s.lens()[1]; |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| const auto& boxes_s = inputs[0]; | |
| const auto& scores_s = inputs[1]; | |
| const auto num_batches = boxes_s.lens()[0]; | |
| const auto num_boxes = boxes_s.lens()[1]; | |
| const auto num_classes = scores_s.lens()[1]; | |
| const auto& boxes_s = inputs[0]; | |
| const auto& scores_s = inputs[1]; | |
| const auto num_batches = boxes_s.lens()[0]; | |
| const auto num_boxes = boxes_s.lens()[1]; | |
| const auto num_classes = scores_s.lens()[1]; |
| const auto num_batches = v.at("num_batches").to<std::size_t>(); | ||
| const auto num_classes = v.at("num_classes").to<std::size_t>(); | ||
| const auto num_boxes = v.at("num_boxes").to<std::size_t>(); |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| const auto num_batches = v.at("num_batches").to<std::size_t>(); | |
| const auto num_classes = v.at("num_classes").to<std::size_t>(); | |
| const auto num_boxes = v.at("num_boxes").to<std::size_t>(); | |
| const auto num_batches = v.at("num_batches").to<std::size_t>(); | |
| const auto num_classes = v.at("num_classes").to<std::size_t>(); | |
| const auto num_boxes = v.at("num_boxes").to<std::size_t>(); |
| __device__ void block_sort(index idx, T& buf) const | ||
| { | ||
| static_assert(is_power_of_2(N), "N must be a power of 2"); | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | ||
| for(index_int k = 2; k <= N; k <<= 1) | ||
| { | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
| __device__ void block_sort_indexed(index idx, CompareAt compare_at, SwapAt swap_at) const | ||
| { | ||
| static_assert(is_power_of_2(N), "N must be a power of 2"); | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | ||
| for(index_int k = 2; k <= N; k <<= 1) | ||
| { | ||
| //NOLINTNEXTLINE(hicpp-signed-bitwise) |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| //NOLINTNEXTLINE(hicpp-signed-bitwise) | |
| // NOLINTNEXTLINE(hicpp-signed-bitwise) |
TedThemistokleous
left a comment
There was a problem hiding this comment.
Few review comments
More some additional place to add asserts and questions about removal of some other legacy tests. Mainly to ensure num_boxes doesn't resolve to zero somewhere.
I think you can simplify one of the kernels to avoid always doing a bunch of extra max, and mul operations for area calculations by adding an early return if either height/width is zero.
| double iou_threshold, | ||
| double score_threshold) const | ||
| { | ||
| // NOTE: should not need to fill with 0 |
There was a problem hiding this comment.
Why don't we just remove this then?
There was a problem hiding this comment.
It's to preserve the previous behavior for now. Technically the operator after NMS should never be reading the values after num_selected_indices.
| // `init`: initializer | ||
| // `fs`: striding function for thread work distribution. | ||
| // `input`: input with input(index_int). | ||
| // `output`: output with output(index_int, inclusive_scan_value_at_index_int). |
There was a problem hiding this comment.
Appreciate the added comments here.
| const auto num_batches = boxes_s.lens()[0]; | ||
| const auto num_boxes = boxes_s.lens()[1]; | ||
| const auto num_classes = scores_s.lens()[1]; | ||
| const auto iou_packed = num_boxes * (num_boxes - 1) / 2; |
There was a problem hiding this comment.
Add an assert for num_boxes > 0 so we if there is a weird issue when num_boxes = 0 we'll catch in debug build
| mod->insert_instruction(ins, make_op("hip::copy_to_gpu"), cpu_sub, gpu_alloc)); | ||
| } | ||
|
|
||
| // TODO: this needs cleanup |
There was a problem hiding this comment.
Make this a separate issue task to handle after then
| const auto& cnt_s = inputs[0]; | ||
| const auto& indices_s = inputs[1]; | ||
| const auto num_batch_class = cnt_s.elements(); | ||
| const auto num_boxes = indices_s.elements() / (num_batch_class * std::size_t{3}); |
There was a problem hiding this comment.
Add an assert here to ensure that this doesn't become zero if this operation results in 0 with remainder since this is size_t. If inidices_s < (num_batch_class / 3) that could pose an issue.
| const auto area_a = max(a[2] - a[0], 0.f) * max(a[3] - a[1], 0.f); | ||
| const auto area_b = max(b[2] - b[0], 0.f) * max(b[3] - b[1], 0.f); | ||
| const auto un = area_a + area_b - inter; | ||
| if(area_a <= 0.f or area_b <= 0.f or un <= 0.f) |
There was a problem hiding this comment.
Why not do the check after each calculation here wouldn't that save compute/calculations since NMS you're typically looking over many items?
You don't need to do another calculation of un if either area_a or area_b fails,
Also if inter == 0 you can skip all the calculations meaning if w or h results in zero the rest of the calculations don't matter. There can be an early return you can do as well.
|
|
||
| idx.local_stride(half, [&](auto i) { | ||
| fill_row(i); | ||
| fill_row(NumBoxes - 1 - i); |
There was a problem hiding this comment.
Are we not concerned if this is negative on the fill? Shouldn't we add a max here?
max(NumBoxes - 1 -i, 0)
There was a problem hiding this comment.
Why remove this test?
There was a problem hiding this comment.
No more dyn_output flag in NMS
There was a problem hiding this comment.
same as above - shouldn't we have the same verify tests as before or was there an error here/some non relevant piece with this test now?
There was a problem hiding this comment.
NMS verify between ref and GPU doesn't make sense with the random data generated by verify.
Motivation
Technical Details
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable