sanitizeFilename is not pluggable — causes drift between storage adapters and the DB #16363
lcnogueira
started this conversation in
Feature Requests & Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
sanitizeFilenameinpayload/sharedis invoked at several points in the upload pipeline (signed-URL handlers in storage adapters, server-side adapter upload paths, core document writes) and is not user-configurable. Projects that need custom filename normalization end up reimplementing it in collection hooks, which run at a different point in the pipeline than the storage adapter's call — and the two can drift, producing 404s when the DBfilenameno longer matches the storage key.A real bug this caused
We recently debugged a production incident where users could not download files whose names contained mixed casing or non-alphanumeric characters. Root cause was a drift between two sanitizers:
beforeOperationhook on each upload collection that lowercased filenames and stripped non-alphanumeric characters.clientUploads: trueon@payloadcms/storage-s3.With
clientUploads: true, the admin UI first POSTs to/api/storage-s3-generate-signed-url, and the plugin handler callssanitizeFilename(filename)to build the S3 object key. The document-create request to/api/<collection>comes second, and our hook ran only there.sanitizeFilenameis case-preserving; our hook was not. For any filename not already invariant under both, the S3 key and the DBfilenamediverged, and downloads 404'd — the DB lookup succeeded, but the S3 fetch missed (keys are case-sensitive).We "fixed" it by deleting our hook entirely. That works only because we didn't actually need the custom transformation. Any user who does need it is stuck in the same trap: reimplement consistently across every write path, or give up.
Proposed change
Make filename sanitization pluggable, per upload collection, with the invariant that the built-in sanitizer always runs after the user's formatter. The user can shape, but cannot produce, an unsafe storage key.
Pseudocode for every call site:
Two possible layers — asking for a steer
(a)
payload/shared+payloadcore. MakesanitizeFilenamepluggable at the upload-collection level in core. Every storage adapter (storage-s3,storage-gcs,storage-azure,storage-vercel-blob, filesystem) inherits the feature with zero changes. The drift-between-adapters-and-DB bug class is resolved once: there is only ever one sanitizer in the system, so the two sides literally cannot disagree.(b)
@payloadcms/storage-s3only. AddformatFilenameas a per-collection option. The plugin reads the formatter ingetGenerateSignedURLHandlerand injects abeforeOperationhook at config-transform time so the same formatter runs on the DB write path. Both paths route through a single internal helper (applyFormatFilename) to make drift structurally impossible within the plugin. Narrower, faster to ship, but doesn't help other adapters.Our preference is (a) for the bug-class reason — the same trap is sitting dormant in every other adapter today. We understand (b) may be more palatable depending on appetite for changes in
payload/shared.A concrete use case
The case we have in mind is URL-friendliness: replacing spaces with dashes, normalizing unicode, stripping characters that would otherwise need URL-encoding in the browser. Built-in
sanitizeFilenamecorrectly leaves these as-is — they're valid storage keys — so in the clientUploads flow we end up with object keys that work but produce brittle, ugly URLs."Why not just improve
sanitizeFilenameby default?"We considered this and don't think it's sufficient:
Safety invariant
The built-in sanitizer always runs after the user's formatter, at every call site. This option introduces no new way to corrupt storage keys.
What we'd like from you
A steer on which layer (a or b) you'd be willing to accept, before we open a PR. We're happy to draft the implementation once we know the layer.
Beta Was this translation helpful? Give feedback.
All reactions