Skip to content

Unfold continuation lines before UTF-8 decode#159

Open
peter-boden wants to merge 1 commit into
fedora-java:masterfrom
peter-boden:fix_continuation
Open

Unfold continuation lines before UTF-8 decode#159
peter-boden wants to merge 1 commit into
fedora-java:masterfrom
peter-boden:fix_continuation

Conversation

@peter-boden

Copy link
Copy Markdown

I ran into a manifest where a UTF-8 multibyte sequence was split across mulitple lines, which made the parser crash.

How to reproduce:

podman run --rm quay.io/ovirt/buildcontainer:el10stream bash -lc 'set -e; VERSION=1.9.18; mvn -q dependency:get -Dartifact=org.apache.maven.resolver:maven-resolver-util:${VERSION} -Dmaven.repo.local=/tmp/m2; python3 -c "from javapackages.common.manifest import Manifest; Manifest(\"/tmp/m2/org/apache/maven/resolver/maven-resolver-util/${VERSION}/maven-resolver-util-${VERSION}.jar\")"'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.12/site-packages/javapackages/common/manifest.py", line 48, in __init__
    self._manifest = self._read_manifest()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/javapackages/common/manifest.py", line 70, in _read_manifest
    return content.decode("utf-8")
           ^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 1767: invalid continuation byte

Changes introduced:

  • process MANIFEST.MF as bytes and join continuation lines before decode
  • simplify _normalize_manifest to line split/strip only
  • add regression tests

Fix UnicodeDecodeError when manifest continuation folding splits UTF-8
multibyte sequences across line boundaries (e.g. "Bou\xc3\r\n \xa9").

- process MANIFEST.MF as bytes and join continuation lines before decode
- simplify _normalize_manifest to line split/strip only
- add regression tests
@dmlloyd

dmlloyd commented Jun 9, 2026

Copy link
Copy Markdown

Not sure how helpful this is, but be sure to check out https://docs.oracle.com/en/java/javase/25/docs/specs/jar/jar.html#notes-on-manifest-and-signature-files if you haven't already.

@edwbuck

edwbuck commented Jun 10, 2026

Copy link
Copy Markdown

Characters must be complete prior to the newline detailed in https://docs.oracle.com/en/java/javase/25/docs/specs/jar/jar.html#manifest-specification

A looser interpretation similar to yours failed on the JDK team https://bugs.java.com/bugdatabase/JDK-8202525 When JDK team found that their own tooling was generating manifests that were non-compliant to the specification (for some versions prior to Java 9), they rewrote their tooling output.

Generally, I advocate for failures to be early and notable. More to the point maven-resolver-util fixed the issue only a few point releases later (fixed in 1.9.23) and is now well into the 2.x release. If you don't see an issue in your other tooling, it might be because it's pre JDK-8202525 or that MANIFEST.MF line might be dropped as @dmlloyd pointed out... malformed manifest lines are dropped).

I'd say this need fixed, but not here. You need to upgrade your maven-resolver-util to 1.9.23 or later and if issues reappear file an apache bug, as it means they're not using Java tooling (or are implementing it with bugs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants