Skip to content

ResizeWithPadPlugin — Aspect-Ratio-Preserving Resize with Padding #4811

Description

@abdulrahman-hassanin

Fature Request: ResizeWithPadPlugin — Aspect-Ratio-Preserving Resize with Padding

Summary

Add a new first-class TensorRT plugin ResizeWithPadPlugin that performs resize-with-padding (letterbox) entirely on the GPU inside the TRT graph, preserving the input aspect ratio by filling the unused canvas with a configurable constant color.

Motivation

Aspect-ratio-preserving resize is a fundamental preprocessing step for a wide range of vision models (YOLO, DETR, RT-DETR, EfficientDet, and others). Today there is no official way to do this inside a TensorRT engine:

  • IResizeLayer resizes but has no padding concept.
  • IPaddingLayer pads but does not compute the scale factor needed to preserve the aspect ratio.
  • cropAndResizePlugin / cropAndResizeDynamicPlugin crop and resize but distort the aspect ratio (see issue Keeping aspect ratio with CropAndResize Plugin #4271, where a maintainer confirmed no trivial solution exists in the current API).

As a result, every deployment team rolls their own CUDA kernel outside the TRT graph, losing the opportunity for graph-level fusion, profiling, and engine portability.

Proposed Plugin: ResizeWithPadPlugin

Name follows the TensorRT OSS convention (operation-first, no model-specific branding).

Inputs

Index Description
0 Input image tensor — [N, C, H_in, W_in] or [N, H_in, W_in, C], uint8 / fp16 / fp32

Outputs

Index Description
0 Padded output tensor — [N, C, H_out, W_out]
1 (optional) Scale factor + pad offsets — [N, 4] as (scale, pad_top, pad_left, 0) for bbox remapping

Attributes

Attribute Type Default Description
output_h int required Target height
output_w int required Target width
pad_value float[3] [114, 114, 114] Fill color (YOLO default; user-configurable)
interpolation enum kLINEAR kNEAREST or kLINEAR
layout enum kNCHW kNCHW or kNHWC
pad_mode enum kCENTER kCENTER or kTOP_LEFT
swap_rb bool false Fused BGR↔RGB channel swap
normalize bool false Fused division by 255.0

Key design decisions

  1. Output 1 (scale + offsets) makes the plugin self-contained for detection pipelines — downstream code can remap bounding boxes back to original image coordinates without recomputing the scale externally.
  2. Fused swap_rb and normalize are opt-in attributes that allow a single kernel launch to replace the resize + channel-swap + normalize sequence common in all YOLO-family deployments, with no intermediate buffers.
  3. Dynamic shapesH_in and W_in are dynamic; only H_out / W_out are fixed attributes, matching real deployment where the network input size is fixed but source images vary.

Implementation Sketch

A single CUDA kernel per batch item:

  1. Compute scale = min(H_out / H_in, W_out / W_in)
  2. Compute scaled dimensions and padding offsets
  3. Fill output canvas with pad_value
  4. Bilinear (or nearest) sample from input into the scaled region
  5. Optionally swap channels and normalize in the same pass
  6. Write scale + offsets to output 1 if requested

The kernel accesses input pixels with coalesced reads (NHWC input) or a transposed access pattern (NCHW input) and writes output in NCHW layout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions