Skip to content

[python] Admit bytes in the Python memory value contract#846

Merged
wenjin272 merged 1 commit into
apache:mainfrom
weiqingy:845-admit-bytes
Jul 2, 2026
Merged

[python] Admit bytes in the Python memory value contract#846
wenjin272 merged 1 commit into
apache:mainfrom
weiqingy:845-admit-bytes

Conversation

@weiqingy

Copy link
Copy Markdown
Collaborator

Linked issue: #845

Purpose of change

Admits exact Python bytes into the set()-time memory value validator (validate_memory_value), which previously rejected it.

bytes was in the originally proposed contract (#723) but was intentionally excluded in #839 because the Python→Java bytes conversion through Pemja was unverified end-to-end — accepting an unsafe value would defeat the validator's purpose, turning an early set() rejection into a checkpoint-time JVM crash on restore. This change closes that gap.

Verification shows exact Python bytes materializes through Pemja into a native Java byte[], which is checkpoint-stable:

  • Pemja 0.5.5 C source (src/main/c/pemja/core/pyutils.c): the Python→Java dispatch JcpPyObject_AsJObject routes PyBytes_CheckExact to JcpPyBytes_AsJObject, whose body is NewByteArray + SetByteArrayRegion — a genuine JVM byte[] with no native back-pointer. bytearray has no branch and falls through to the generic JcpPyObject_AsJPyObject wrapper (which holds a process-local pointer and is unsafe on restore). The conversion is identical in 0.5.5 and 0.5.7.
  • A local embedded-Pemja probe driving a real interpreter confirmed it at runtime: b"hello" materializes on the Java side as byte[] ([B), while bytearray materializes as pemja.core.object.PyObject and str as java.lang.String.

Because byte[] is a first-class Flink-serializable primitive array, restore safety follows from the already-proven byte[] path. The change is a single addition to the validator's exact-type scalar set: the existing exact-type check (type(value) in _CHECKPOINT_STABLE_SCALARS, not isinstance) admits exact bytes while keeping bytearray and bytes subclasses rejected, since Pemja wraps those as non-checkpoint-stable objects.

A true checkpoint-restart round-trip test cannot run on the MiniCluster (in-place recovery does not recreate the JVM, so the Pemja conversion path is not crossed). A permanent end-to-end assertion is left for the checkpoint-recovery harness tracked in #836.

Tests

python/flink_agents/runtime/tests/test_memory_value_validation.py:

  • Accept: exact bytes as a scalar (b"", b"hello") and nested in list/dict.
  • Reject: a new test_rejects_bytearray_and_bytes_subclass (using a real bytes subclass and bytearray) that pins the exact-type boundary — an isinstance-based regression would wrongly accept the subclass and fail this test.

Verified with uv run pytest flink_agents/runtime/tests/test_memory_value_validation.py -v (all pass), the broader uv run pytest flink_agents -k "not e2e_tests" (no regression), and ./tools/lint.sh -c (0 violations).

API

No new or changed public API signatures. This widens the accepted value set of the existing Python MemoryObject.set() contract to include exact bytes; the TypeError message and docstring are updated to list bytes among the accepted types.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

Exact Python bytes materializes through Pemja into a native Java byte[]
(checkpoint-stable), so it is admitted into the set()-time memory value
validator. The validator's exact-type check accepts exact bytes while
keeping bytearray and bytes-subclasses rejected, since Pemja wraps those
as non-checkpoint-stable objects rather than materializing a byte[].

Adds accept coverage (scalar and nested) and a reject test pinning the
exact-type boundary, and updates the Python memory value contract docs.
@github-actions github-actions Bot added doc-included Your PR already contains the necessary documentation updates. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels Jun 14, 2026
@wenjin272 wenjin272 added fixVersion/0.4.0 and removed fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. labels Jun 15, 2026

@LuuOW LuuOW left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical audit: code patterns and implementation verified for alignment with modern software engineering standards.

@wenjin272 wenjin272 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @weiqingy. LGTM.

Besides, during the review process, I discovered an existing issue related to bytes that had been present before:

The durable-execution path (ActionStateStore, Kafka/Fluss) serializes
MemoryUpdate.value via a plain Jackson ObjectMapper with no type info
(value is Object). A byte[] is written as a base64 String and, on
crash/replay, deserialized back as String — then written into memory.
The consumer silently gets str/String instead of the original bytes.

This is a pre-existing bug: the Java side never restricted byte[] memory
values (a Java agent can set them directly). PR #846, which now lets the
Python side write bytes too, makes it more likely to be hit.

I created a issue for this #865.

@wenjin272 wenjin272 merged commit 7b716a3 into apache:main Jul 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-included Your PR already contains the necessary documentation updates. fixVersion/0.4.0 priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants