Unfold continuation lines before UTF-8 decode by peter-boden · Pull Request #159 · fedora-java/javapackages

peter-boden · 2026-05-19T11:51:24Z

I ran into a manifest where a UTF-8 multibyte sequence was split across mulitple lines, which made the parser crash.

How to reproduce:

podman run --rm quay.io/ovirt/buildcontainer:el10stream bash -lc 'set -e; VERSION=1.9.18; mvn -q dependency:get -Dartifact=org.apache.maven.resolver:maven-resolver-util:${VERSION} -Dmaven.repo.local=/tmp/m2; python3 -c "from javapackages.common.manifest import Manifest; Manifest(\"/tmp/m2/org/apache/maven/resolver/maven-resolver-util/${VERSION}/maven-resolver-util-${VERSION}.jar\")"'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.12/site-packages/javapackages/common/manifest.py", line 48, in __init__
    self._manifest = self._read_manifest()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/javapackages/common/manifest.py", line 70, in _read_manifest
    return content.decode("utf-8")
           ^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 1767: invalid continuation byte

Changes introduced:

process MANIFEST.MF as bytes and join continuation lines before decode
simplify _normalize_manifest to line split/strip only
add regression tests

Fix UnicodeDecodeError when manifest continuation folding splits UTF-8 multibyte sequences across line boundaries (e.g. "Bou\xc3\r\n \xa9"). - process MANIFEST.MF as bytes and join continuation lines before decode - simplify _normalize_manifest to line split/strip only - add regression tests

dmlloyd · 2026-06-09T15:11:42Z

Not sure how helpful this is, but be sure to check out https://docs.oracle.com/en/java/javase/25/docs/specs/jar/jar.html#notes-on-manifest-and-signature-files if you haven't already.

edwbuck · 2026-06-10T14:32:07Z

Characters must be complete prior to the newline detailed in https://docs.oracle.com/en/java/javase/25/docs/specs/jar/jar.html#manifest-specification

A looser interpretation similar to yours failed on the JDK team https://bugs.java.com/bugdatabase/JDK-8202525 When JDK team found that their own tooling was generating manifests that were non-compliant to the specification (for some versions prior to Java 9), they rewrote their tooling output.

Generally, I advocate for failures to be early and notable. More to the point maven-resolver-util fixed the issue only a few point releases later (fixed in 1.9.23) and is now well into the 2.x release. If you don't see an issue in your other tooling, it might be because it's pre JDK-8202525 or that MANIFEST.MF line might be dropped as @dmlloyd pointed out... malformed manifest lines are dropped).

I'd say this need fixed, but not here. You need to upgrade your maven-resolver-util to 1.9.23 or later and if issues reappear file an apache bug, as it means they're not using Java tooling (or are implementing it with bugs.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unfold continuation lines before UTF-8 decode#159

Unfold continuation lines before UTF-8 decode#159
peter-boden wants to merge 1 commit into
fedora-java:masterfrom
peter-boden:fix_continuation

peter-boden commented May 19, 2026

Uh oh!

dmlloyd commented Jun 9, 2026

Uh oh!

edwbuck commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

peter-boden commented May 19, 2026

Uh oh!

dmlloyd commented Jun 9, 2026

Uh oh!

edwbuck commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants