Skip to content

refactor(pypi): reuse the same whl_library instances#3856

Draft
aignas wants to merge 1 commit into
bazel-contrib:mainfrom
aignas:aignas.refactor.single_dep_whl_library
Draft

refactor(pypi): reuse the same whl_library instances#3856
aignas wants to merge 1 commit into
bazel-contrib:mainfrom
aignas:aignas.refactor.single_dep_whl_library

Conversation

@aignas

@aignas aignas commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Summary:

  • plan.md
  • feat(pypi): add whl_library_optimized mode behind env var gate
  • feat(pypi): implement hub aliases for optimized whl_library extras
  • feat(pypi): forward optimized whl extras to unified hub repo
  • TODO: up-to-this done with free opencode, continue tomorrow.

Fixes #2948
Work towards #2530

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an optimized mode for whl_library (gated by the RULES_PYTHON_WHL_LIBRARY_OPTIMIZED environment variable) to allow wheel reuse across different Python versions by omitting the Python version from spoke repository names, generating per-extra targets, and creating explicit aliases in the hub repository. The code review identified several critical issues with this implementation: static aliases in render_pkg_aliases.bzl break multi-version/multi-platform setups, and iterating over extras_info causes aliases to overwrite each other. Additionally, a naming mismatch in extension.bzl between the unified hub alias and the hub repository package name will result in broken aliases, and skipping pkg__extra target generation when there are no extra dependencies leads to build failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +90 to +106
def _render_extra_alias(*, name, repo, target):
return """\
package(default_visibility = ["//visibility:public"])

alias(
name = "pkg",
actual = "@{repo}//:{target}",
)

alias(
name = "whl",
actual = "@{repo}//:{target}",
)
""".format(
repo = repo,
target = target,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The _render_extra_alias function generates a static alias pointing to a single spoke repository. However, in a multi-version or multi-platform setup, there are multiple spoke repositories. We should replace this with a multiplatform alias generator that uses select to choose the correct spoke repository based on the active configuration.

def _render_extra_alias_multiplatform(*, name, repo_mapping, target_suffix):
    if type(repo_mapping) == type(""):
        actual_expr = repr("@{}//:{}".format(repo_mapping, target_suffix))
        load_statement = ""
    else:
        actual_dict = {}
        for config_setting, repo_name in repo_mapping.items():
            key = _repr_config_setting(config_setting)
            actual_dict[key] = repr("@{}//:{}".format(repo_name, target_suffix))

        if len(actual_dict) == 1 and list(actual_dict.keys())[0] == repr("//conditions:default"):
            actual_expr = list(actual_dict.values())[0]
        else:
            sorted_pairs = sorted(actual_dict.items())
            actual_expr = "select({\n"
            for k, v in sorted_pairs:
                actual_expr += "        {}: {},\n".format(k, v)
            actual_expr += "    })"

        needs_load = any(["whl_config_setting" in k for k in actual_dict.keys()])
        load_statement = ""
        if needs_load:
            load_statement = 'load("@rules_python//python/private/pypi:whl_config_setting.bzl", "whl_config_setting")\n'

    return """\
{load_statement}package(default_visibility = ["//visibility:public"])

alias(
    name = "pkg",
    actual = {actual_expr},
)

alias(
    name = "whl",
    actual = {actual_expr},
)
""".format(
        load_statement = load_statement,
        actual_expr = actual_expr,
    )

Comment on lines +158 to +176
# Generate extra alias directories for optimized mode extras.
for name, extras_info in whl_extras.items():
normalized = normalize_name(name)
for repo_name, extra_names in extras_info.items():
for extra in extra_names:
extra_pkg = "{}_{}".format(normalized, extra)
files["{}/BUILD.bazel".format(extra_pkg)] = _render_extra_alias(
name = extra_pkg,
repo = repo_name,
target = "{}__{}".format(normalized, extra),
)

# Also create pkg__ alias (no extras)
no_extras_pkg = "{}__".format(normalized)
files["{}/BUILD.bazel".format(no_extras_pkg)] = _render_extra_alias(
name = no_extras_pkg,
repo = repo_name,
target = normalized,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In a multi-version or multi-platform setup, whl_extras contains multiple spoke repositories for the same wheel. Iterating over extras_info.items() and writing directly to files causes the alias for one spoke to overwrite the others, breaking multi-version/multi-platform support for extras. We should use the multiplatform alias generator to create a single select-based alias that correctly routes to the appropriate spoke repository.

    # Generate extra alias directories for optimized mode extras.
    for name, extras_info in whl_extras.items():
        normalized = normalize_name(name)
        pkg_aliases = aliases.get(normalized)
        if not pkg_aliases:
            continue

        # Find all unique extras across all repos for this wheel
        unique_extras = {}
        for extra_names in extras_info.values():
            for extra in extra_names:
                unique_extras[extra] = True

        for extra in unique_extras:
            extra_pkg = "{}_{}".format(normalized, extra)
            files["{}/BUILD.bazel".format(extra_pkg)] = _render_extra_alias_multiplatform(
                name = extra_pkg,
                repo_mapping = pkg_aliases,
                target_suffix = "{}__{}".format(normalized, extra),
            )

        # Also create pkg__ alias (no extras)
        no_extras_pkg = "{}__".format(normalized)
        files["{}/BUILD.bazel".format(no_extras_pkg)] = _render_extra_alias_multiplatform(
            name = no_extras_pkg,
            repo_mapping = pkg_aliases,
            target_suffix = normalized,
        )

Comment thread python/private/pypi/extension.bzl Outdated
norm_pkg = normalize_name(whl_name)
for extra_names in extras_info.values():
for extra in extra_names:
alias_name = "%s__%s" % (norm_pkg, extra)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a naming mismatch between the unified hub alias name and the hub repository package name. The unified hub alias is generated with double underscores __ (e.g., requests__security), whereas render_pkg_aliases.bzl generates the hub repository package with a single underscore _ (e.g., requests_security). This mismatch will cause the unified hub aliases to point to non-existent packages. Update this to use a single underscore to match the hub repository package naming.

Suggested change
alias_name = "%s__%s" % (norm_pkg, extra)
alias_name = "%s_%s" % (norm_pkg, extra)

Comment on lines +182 to +183
if not extra_only_deps and not extra_only_deps_select:
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Skipping the generation of the pkg__extra target when there are no extra dependencies (or when all extra dependencies are already present in the base dependencies) will cause the hub repository's alias to point to a non-existent target, leading to build failures. We should always generate the pkg__extra target, even if it only depends on ":pkg", to ensure that the hub aliases remain valid.

@rickeylev rickeylev left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saw this was draft, so didn't look too thoroughly, just for things that looked like the bot was obviously wrong

Since this is a refactor for users not using private APIs, the env variable will be switched/removed
later together with old code.

All of the code should be written in a TDD style where we add a failing test for a feature and then

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. It it working ok?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that the tests are sometimes better quality and it helps with the iterative process.

So the naming convention after the change is:
`<prefix>_<name>_<py_tag>_<abi_tag>_<platform_tag>_<sha256[:8]>`

Where `<prefix>` is the hub repository name and the rest of the segments come from the wheel name

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could do away with the hub name in the whl_library name, this would allow different hubs to benefit from the optimization, too. (any progress on de-duplication would be good, though!)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work we would have to have the dep graph in the bu repo. Not not-doable, but requires the whl_library to not do any depgraph generation. The reasons are 2 fold:

  1. The hubs may declare cyclic dependency graphs. The handling of that is done at the hub level, not the spoke, but in the spoke we need to exclude them. We could potentially reuse whl_library nodes that do not have any cycles.
  2. We are not passing target_platforms to the whl_library and instead we are passing the packages that the hub contains, so that we can avoid bazel query errors. This was done in the hopes that whl_library may not need to know target_platforms to generate the deps. In the end, this may be not a valid assumption.

For us to be able to reuse the whl_library in all places we have to:

  • Write the target_platforms into the unified_hub as some sort of venv -> list[target_platform] map.
  • Load that via load("unified_hub") in the whl_library and for each value we would generate an appropriate deps field. This would happen in a macro, so would not trigger refetching of the repos.
  • Load cycles per venv from the unified hub repo. This allows us to exclude the deps in the cycle from the whl_library context. Again, the strategy should be done as above.

I think this is technically doable with the current layout and it could become way faster that way, but I'd like to first untangle the per-venv usage.

@@ -0,0 +1,131 @@
# Single dep for spoke repos per `whl_name`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk how much of this plan is hand-written vs generated.

fwiw, i've had best results with having it generate the plan, then go through a few rounds of review and tweaking.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done the same - write some of the plan and then go a few rounds of review and tweaking, but I did the review via AI - i.e. does it understand what to implement? We'll see how it works. I'll move the file to the .agents dir.

def render_pkg_aliases(*, aliases, requirement_cycles = None, extra_hub_aliases = {}, **kwargs):
def _render_extra_alias(*, name, repo, target):
return """\
package(default_visibility = ["//visibility:public"])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this package() line probably doesn't belong here and will cause weird errors

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let AI find this out via TDD :D Thanks for highlighting this.

@aignas

aignas commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

Again, ran out of free tokens, so will postpone for another day.

remove incorrect code written by AI
@aignas aignas force-pushed the aignas.refactor.single_dep_whl_library branch from 453b0b3 to 23c95ab Compare June 28, 2026 14:06
pull Bot pushed a commit to garymm/rules_python that referenced this pull request Jun 29, 2026
This is no longer used starting when we enabled pipstar by default and
did a code cleanup where Python is no longer used to extract the wheels.

Split from bazel-contrib#3856
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setup fewer wheel repos in pip.parse

2 participants