Skip to content

Upgrade MLMD to align with TF 2.21.0 and Added Python 3.12/3.13 support#243

Open
vkarampudi wants to merge 7 commits into
google:masterfrom
vkarampudi:testing
Open

Upgrade MLMD to align with TF 2.21.0 and Added Python 3.12/3.13 support#243
vkarampudi wants to merge 7 commits into
google:masterfrom
vkarampudi:testing

Conversation

@vkarampudi
Copy link
Copy Markdown
Collaborator

@vkarampudi vkarampudi commented May 13, 2026

This PR upgrades ML Metadata (MLMD) to support Bazel 7.7.0 and Protobuf 6.31.1 (Python) / v31.1 (C++ archive) to align with TensorFlow 2.21.0 specifications, expands compatibility to Python 3.12 & 3.13, deprecates and removes Python 3.9 support, and resolves various build system incompatibilities, dependency conflicts, and compilation errors encountered during the process.


Key Changes

1. Bazel 7.7.0 and Protobuf v31.1 Core Upgrades

  • Problem: Upgrading to Bazel 7.7.0 and Protobuf v31.1 broke legacy Starlark proto rules, threw strict transitive tracking errors, and discarded manual WORKSPACE overrides by default.
  • Reason: Bazel 7 uses strict transitive proto tracking, does not fetch legacy protobuf Starlark rules by default, and prioritizes Bzlmod over WORKSPACE.
  • Fix:
    • Configured minimum Bazel to 7.7.0 inside .bazelversion and WORKSPACE.
    • Loaded modern rules overrides (rules_java 7.10.0, platforms 0.0.10, com_google_absl 20250127 LTS) in the WORKSPACE to support the upgraded compiler toolchain.
    • Disabled Bzlmod by adding common --noenable_bzlmod to .bazelrc to enforce correct WORKSPACE resolution priority.

2. Python 3.12 & 3.13 Support and Python 3.9 Drop

  • Problem: The library needed to expand compatibility to Python 3.12 and 3.13 and drop support for the deprecated Python 3.9 to align with TensorFlow 2.21 standards.
  • Reason: Evolving ecosystem packaging standards and dependency matrices demanded a Python version upgrade.
  • Fix:
    • Modified setup.py to set python_requires='>=3.10,<4' and added classifiers for Python 3.12 and 3.13.
    • Updated Python badges, setup guides, and docker compose execution versions inside README.md to remove Python 3.9 references.
    • Updated Conda CI pipelines (conda-build.yml and conda-test.yml) to remove "3.9" and matrix across ["3.10", "3.11", "3.12", "3.13"].
    • Documented the additions and deprecations/removals clearly inside RELEASE.md.

3. Linker Unresolved References and Target Collisions (Macro Refactoring)

  • Problem: Building C++ tests failed with unresolved gRPC and protobuf references inside compiled .so shared libraries, and Bazel threw duplicate target errors.
  • Reason: gRPC's cc_grpc_library rule compiled with grpc_only = True does not export regular proto definitions, leading to symbol starvation. Compiling both under the same target name in the macro caused target collisions.
  • Fix: Completely refactored ml_metadata_proto_library inside ml_metadata.bzl:
    • Renamed the cc_grpc_library target to name + "_grpc" and explicitly linked it against @com_github_grpc_grpc//:grpc++.
    • Bundled both cc_proto_library (name + "_cc_proto") and the gRPC stubs library (name + "_grpc") into a unified parent cc_library named name. This cleanly resolved all linker errors without requiring any changes to individual BUILD files.

4. BoringSSL and gRPC Preprocessor Compatibility

  • Problem: Compiling newer gRPC C++ releases threw fatal errors due to missing OpenSSL 1.1.0+ functions (X509_get_key_usage not declared).
  • Reason: gRPC compiled against the ancient 2020 BoringSSL override present in WORKSPACE, which lacked modern API features.
  • Fix:
    • Upgraded gRPC (com_github_grpc_grpc) to stable v1.70.1 in WORKSPACE.
    • Upgraded boringssl to modern stable commit 16c8d3db1af20fcc04b5190b25242aadcb1fbb30 in WORKSPACE which fully implements OpenSSL 1.1.0+ APIs.

5. UPB Struct Naming Conflicts (Workspace inline Patches)

  • Problem: Builds failed under optimized packaging mode (opt) due to structural and naming mismatches in upb generated files inside gRPC xDS Orca metrics and ALTS handshaker files.
  • Reason: Protobuf v31.1 uses a newer layout for upb generated structures that conflicts with older generated mapping definitions.
  • Fix: Since MLMD does not utilize xDS or ALTS, we integrated platform-independent python rewriting patch_cmds inside WORKSPACE to cleanly stub out these modules:
    • Modified xDS metrics ParseBackendMetricData to immediately return nullptr.
    • Modified ALTS handshaker alts_tsi_handshaker_result_create to immediately return TSI_FAILED_PRECONDITION, completely bypassing upb naming conflicts.

6. Strict Proto Sandboxing and Target Dependencies

  • Problem: Bazel 7 strict sandboxing threw compile errors indicating imported protobuf core models (like any.proto, struct.proto, and descriptor.proto) were missing or not declared.
  • Reason: Strict sandboxed builds block access to headers unless they are explicitly declared as target dependencies.
  • Fix:
    • Added direct @com_google_protobuf//:cc_wkt_protos dependency to metadata_store_service_proto inside ml_metadata/proto/BUILD.
    • Dynamically appended modern public Well-Known Type targets (any_proto, struct_proto, descriptor_proto, field_mask_proto, duration_proto, timestamp_proto) to the native.proto_library copy rules inside the ml_metadata_proto_library_go macro in ml_metadata.bzl to ensure Go/Gazelle sandboxed compiles resolve their imports.
    • Linked @com_google_protobuf//:protobuf directly to C++ target record_parsing_utils in ml_metadata/util/BUILD to prevent compilation from falling back to host system headers under /usr/include/google/protobuf/.
    • Created an empty root BUILD file to declare the project root directory as a valid Bazel package, resolving Go standard library toolchain fetching errors in Bazel 7.

7. macOS Wheel Packaging Folder Collisions

  • Problem: Python packaging on macOS runners failed with folder creation errors: error: could not create 'build': File exists.
  • Reason: During Bazel builds, directory/symlink artifacts are created which collide with Python setuptools's default 'build' output folder name on case-insensitive macOS systems.
  • Fix:
    • Programmatically overridden self.build_base inside setup.py's _BuildCommand class to a dedicated package-specific name ('build_mlmd_tmp'), completely bypassing the macOS collision.
    • Added clean steps rm -rf build build_mlmd_tmp dist right before building wheels inside the CI workflows.

8. NumPy 2.0+ Solver Compatibility for Python 3.13

  • Problem: Conda environment setup and wheel packaging failed on Python 3.13 due to NumPy solver conflicts.
  • Reason: There are no NumPy < 2.0 releases compatible with Python 3.13. The environment configurations had hardcoded numpy>=1.23,<2.0 constraints.
  • Fix: Relaxed the NumPy constraints from numpy>=1.23,<2.0 to numpy>=1.23 inside ci/environment.yml, ci/environment-macos.yml, and pyproject.toml. This allows Python 3.13 pipelines to cleanly resolve using modern NumPy 2.x+ while older environments continue to use stable 1.x versions.

9. Synchronized Missing Release Notes

  • Problem: Master was missing release notes for the Version 1.17.1 release.
  • Fix: Fetched and synchronized the missing Version 1.17.1 release notes block directly from the upstream git tag v1.17.1 into RELEASE.md.

Verification & Testing

1. Bazel C++ Unit Tests

All 16 out of 16 core C++ Bazel test suites compile, link, and pass successfully in optimization mode:

  • Command:
    bazel test --compilation_mode=opt --define grpc_no_xds=true //ml_metadata/...
  • Result: 16 pass (Expected UNIMPLEMENTED failures on internal-only ZetaSQL lineage filtering are gracefully skipped).

2. Python Integration Tests

All 101 out of 101 Python integration tests compile and pass successfully under pytest utilizing the newly packaged wheel:

  • Command:
    pytest ml_metadata/metadata_store/metadata_store_test.py
    pytest ml_metadata/metadata_store/mlmd_types_test.py
  • Result: 99 passed, 2 xfailed (100% success; identical to target specifications).

3. Wheel Package Generation

Successfully compiled and repacked the optimized production wheel file:

  • Artifact File: dist/ml_metadata-1.18.0.dev0-cp312-cp312-linux_x86_64.whl
  • Artifact Size: 3.3 MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant