You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fature Request: ResizeWithPadPlugin — Aspect-Ratio-Preserving Resize with Padding
Summary
Add a new first-class TensorRT plugin ResizeWithPadPlugin that performs resize-with-padding (letterbox) entirely on the GPU inside the TRT graph, preserving the input aspect ratio by filling the unused canvas with a configurable constant color.
Motivation
Aspect-ratio-preserving resize is a fundamental preprocessing step for a wide range of vision models (YOLO, DETR, RT-DETR, EfficientDet, and others). Today there is no official way to do this inside a TensorRT engine:
IResizeLayer resizes but has no padding concept.
IPaddingLayer pads but does not compute the scale factor needed to preserve the aspect ratio.
cropAndResizePlugin / cropAndResizeDynamicPlugin crop and resize but distort the aspect ratio (see issue Keeping aspect ratio with CropAndResize Plugin #4271, where a maintainer confirmed no trivial solution exists in the current API).
As a result, every deployment team rolls their own CUDA kernel outside the TRT graph, losing the opportunity for graph-level fusion, profiling, and engine portability.
Proposed Plugin: ResizeWithPadPlugin
Name follows the TensorRT OSS convention (operation-first, no model-specific branding).
Scale factor + pad offsets — [N, 4] as (scale, pad_top, pad_left, 0) for bbox remapping
Attributes
Attribute
Type
Default
Description
output_h
int
required
Target height
output_w
int
required
Target width
pad_value
float[3]
[114, 114, 114]
Fill color (YOLO default; user-configurable)
interpolation
enum
kLINEAR
kNEAREST or kLINEAR
layout
enum
kNCHW
kNCHW or kNHWC
pad_mode
enum
kCENTER
kCENTER or kTOP_LEFT
swap_rb
bool
false
Fused BGR↔RGB channel swap
normalize
bool
false
Fused division by 255.0
Key design decisions
Output 1 (scale + offsets) makes the plugin self-contained for detection pipelines — downstream code can remap bounding boxes back to original image coordinates without recomputing the scale externally.
Fused swap_rb and normalize are opt-in attributes that allow a single kernel launch to replace the resize + channel-swap + normalize sequence common in all YOLO-family deployments, with no intermediate buffers.
Dynamic shapes — H_in and W_in are dynamic; only H_out / W_out are fixed attributes, matching real deployment where the network input size is fixed but source images vary.
Implementation Sketch
A single CUDA kernel per batch item:
Compute scale = min(H_out / H_in, W_out / W_in)
Compute scaled dimensions and padding offsets
Fill output canvas with pad_value
Bilinear (or nearest) sample from input into the scaled region
Optionally swap channels and normalize in the same pass
Write scale + offsets to output 1 if requested
The kernel accesses input pixels with coalesced reads (NHWC input) or a transposed access pattern (NCHW input) and writes output in NCHW layout.
Fature Request:
ResizeWithPadPlugin— Aspect-Ratio-Preserving Resize with PaddingSummary
Add a new first-class TensorRT plugin
ResizeWithPadPluginthat performs resize-with-padding (letterbox) entirely on the GPU inside the TRT graph, preserving the input aspect ratio by filling the unused canvas with a configurable constant color.Motivation
Aspect-ratio-preserving resize is a fundamental preprocessing step for a wide range of vision models (YOLO, DETR, RT-DETR, EfficientDet, and others). Today there is no official way to do this inside a TensorRT engine:
IResizeLayerresizes but has no padding concept.IPaddingLayerpads but does not compute the scale factor needed to preserve the aspect ratio.cropAndResizePlugin/cropAndResizeDynamicPlugincrop and resize but distort the aspect ratio (see issue Keeping aspect ratio with CropAndResize Plugin #4271, where a maintainer confirmed no trivial solution exists in the current API).As a result, every deployment team rolls their own CUDA kernel outside the TRT graph, losing the opportunity for graph-level fusion, profiling, and engine portability.
Proposed Plugin:
ResizeWithPadPluginName follows the TensorRT OSS convention (operation-first, no model-specific branding).
Inputs
[N, C, H_in, W_in]or[N, H_in, W_in, C],uint8/fp16/fp32Outputs
[N, C, H_out, W_out][N, 4]as(scale, pad_top, pad_left, 0)for bbox remappingAttributes
output_houtput_wpad_value[114, 114, 114]interpolationkLINEARkNEARESTorkLINEARlayoutkNCHWkNCHWorkNHWCpad_modekCENTERkCENTERorkTOP_LEFTswap_rbfalsenormalizefalseKey design decisions
swap_rbandnormalizeare opt-in attributes that allow a single kernel launch to replace the resize + channel-swap + normalize sequence common in all YOLO-family deployments, with no intermediate buffers.H_inandW_inare dynamic; onlyH_out/W_outare fixed attributes, matching real deployment where the network input size is fixed but source images vary.Implementation Sketch
A single CUDA kernel per batch item:
scale = min(H_out / H_in, W_out / W_in)pad_valueThe kernel accesses input pixels with coalesced reads (NHWC input) or a transposed access pattern (NCHW input) and writes output in NCHW layout.