Skip to content

fix: normalize percent-encoding in query and fragment (RFC 3986 §6.2.2)#183

Open
spokodev wants to merge 1 commit into
fastify:mainfrom
spokodev:fix/normalize-query-fragment-encoding
Open

fix: normalize percent-encoding in query and fragment (RFC 3986 §6.2.2)#183
spokodev wants to merge 1 commit into
fastify:mainfrom
spokodev:fix/normalize-query-fragment-encoding

Conversation

@spokodev

Copy link
Copy Markdown
Contributor

What

RFC 3986 §6.2.2 (case + percent-encoding normalization) is applied to the path but not to the query or fragment.

const fastUri = require('fast-uri')

// query: not normalized at all
fastUri.normalize('x://h/p?a=%2a')   // 'x://h/p?a=%2a'   (hex not uppercased)
fastUri.normalize('x://h/p?a b')     // 'x://h/p?a b'     (raw space not encoded)

// fragment: reserved characters are decoded, changing the value
fastUri.normalize('x://h/p#a%2Fb')   // 'x://h/p#a/b'     (%2F became /)

Why these are bugs

  • §6.2.2.1 (case): "all letters within a percent-encoding triplet ... should be normalized to use uppercase." The path does this; the query did not.
  • §6.2.2.2 (percent-encoding): decoding must be limited to unreserved characters. The fragment used encodeURI(decodeURIComponent(fragment)), which decodes reserved characters too, so %2F/ and %2A* — a value change.
  • uri-js (the iso-compat target) normalizes both: ?a=%2a?a=%2A, and keeps #a%2Fb as #a%2Fb.

Change

Add normalizeQueryFragmentEncoding — the existing normalizePathEncoding with the query/fragment character set (which additionally permits ? and decodes ., since there are no dot-segments outside a path) — and route the query and fragment through it. All three components are now normalized consistently:

input before after
?a=%2a ?a=%2a ?a=%2A
?a b ?a b ?a%20b
#a%2Fb%2A #a/b* #a%2Fb%2A
#f?x/y #f?x/y #f?x/y (unchanged)

One behavior change worth calling out

encodeURI(decodeURIComponent(...)) threw on a fragment whose decoded bytes are not valid UTF-8 (e.g. #%E0%A4A), which the old code caught and flagged as error: 'URI malformed'. Such a fragment is still valid percent-encoding (RFC 3986 §2.1 is byte-level and does not require UTF-8), so it is now preserved without an error. The 'tolerates malformed fragments' resolve/equal tests still pass; I updated the one parse assertion that checked for the error flag (the fragment value is unchanged).

Tests

Added test/query-fragment-normalization.test.js. Full npm run test:unit passes (884), eslint clean.

RFC 3986 §6.2.2 case/percent normalization was applied to the path but
not to the query or fragment:

- the query was passed through verbatim, so `?a=%2a` was not uppercased to
  `?a=%2A`, `%7e`/`%2e` were not decoded, and a raw space or non-ASCII
  byte was left unencoded;
- the fragment went through `encodeURI(decodeURIComponent(...))`, which
  decodes reserved characters too — `#a%2Fb` became `#a/b`, changing the
  value.

Add `normalizeQueryFragmentEncoding` (the path normalizer with the
query/fragment character set, which also allows `?` and decodes `.`) and
route both components through it, so all three are normalized the same way.

A fragment whose decoded bytes are not valid UTF-8 (e.g. `#%E0%A4A`) is
still valid percent-encoding (RFC 3986 §2.1 is byte-level) and is now
preserved instead of being flagged as malformed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant