refactor(pypi): reuse the same whl_library instances#3856
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements an optimized mode for whl_library (gated by the RULES_PYTHON_WHL_LIBRARY_OPTIMIZED environment variable) to allow wheel reuse across different Python versions by omitting the Python version from spoke repository names, generating per-extra targets, and creating explicit aliases in the hub repository. The code review identified several critical issues with this implementation: static aliases in render_pkg_aliases.bzl break multi-version/multi-platform setups, and iterating over extras_info causes aliases to overwrite each other. Additionally, a naming mismatch in extension.bzl between the unified hub alias and the hub repository package name will result in broken aliases, and skipping pkg__extra target generation when there are no extra dependencies leads to build failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def _render_extra_alias(*, name, repo, target): | ||
| return """\ | ||
| package(default_visibility = ["//visibility:public"]) | ||
|
|
||
| alias( | ||
| name = "pkg", | ||
| actual = "@{repo}//:{target}", | ||
| ) | ||
|
|
||
| alias( | ||
| name = "whl", | ||
| actual = "@{repo}//:{target}", | ||
| ) | ||
| """.format( | ||
| repo = repo, | ||
| target = target, | ||
| ) |
There was a problem hiding this comment.
The _render_extra_alias function generates a static alias pointing to a single spoke repository. However, in a multi-version or multi-platform setup, there are multiple spoke repositories. We should replace this with a multiplatform alias generator that uses select to choose the correct spoke repository based on the active configuration.
def _render_extra_alias_multiplatform(*, name, repo_mapping, target_suffix):
if type(repo_mapping) == type(""):
actual_expr = repr("@{}//:{}".format(repo_mapping, target_suffix))
load_statement = ""
else:
actual_dict = {}
for config_setting, repo_name in repo_mapping.items():
key = _repr_config_setting(config_setting)
actual_dict[key] = repr("@{}//:{}".format(repo_name, target_suffix))
if len(actual_dict) == 1 and list(actual_dict.keys())[0] == repr("//conditions:default"):
actual_expr = list(actual_dict.values())[0]
else:
sorted_pairs = sorted(actual_dict.items())
actual_expr = "select({\n"
for k, v in sorted_pairs:
actual_expr += " {}: {},\n".format(k, v)
actual_expr += " })"
needs_load = any(["whl_config_setting" in k for k in actual_dict.keys()])
load_statement = ""
if needs_load:
load_statement = 'load("@rules_python//python/private/pypi:whl_config_setting.bzl", "whl_config_setting")\n'
return """\
{load_statement}package(default_visibility = ["//visibility:public"])
alias(
name = "pkg",
actual = {actual_expr},
)
alias(
name = "whl",
actual = {actual_expr},
)
""".format(
load_statement = load_statement,
actual_expr = actual_expr,
)
| # Generate extra alias directories for optimized mode extras. | ||
| for name, extras_info in whl_extras.items(): | ||
| normalized = normalize_name(name) | ||
| for repo_name, extra_names in extras_info.items(): | ||
| for extra in extra_names: | ||
| extra_pkg = "{}_{}".format(normalized, extra) | ||
| files["{}/BUILD.bazel".format(extra_pkg)] = _render_extra_alias( | ||
| name = extra_pkg, | ||
| repo = repo_name, | ||
| target = "{}__{}".format(normalized, extra), | ||
| ) | ||
|
|
||
| # Also create pkg__ alias (no extras) | ||
| no_extras_pkg = "{}__".format(normalized) | ||
| files["{}/BUILD.bazel".format(no_extras_pkg)] = _render_extra_alias( | ||
| name = no_extras_pkg, | ||
| repo = repo_name, | ||
| target = normalized, | ||
| ) |
There was a problem hiding this comment.
In a multi-version or multi-platform setup, whl_extras contains multiple spoke repositories for the same wheel. Iterating over extras_info.items() and writing directly to files causes the alias for one spoke to overwrite the others, breaking multi-version/multi-platform support for extras. We should use the multiplatform alias generator to create a single select-based alias that correctly routes to the appropriate spoke repository.
# Generate extra alias directories for optimized mode extras.
for name, extras_info in whl_extras.items():
normalized = normalize_name(name)
pkg_aliases = aliases.get(normalized)
if not pkg_aliases:
continue
# Find all unique extras across all repos for this wheel
unique_extras = {}
for extra_names in extras_info.values():
for extra in extra_names:
unique_extras[extra] = True
for extra in unique_extras:
extra_pkg = "{}_{}".format(normalized, extra)
files["{}/BUILD.bazel".format(extra_pkg)] = _render_extra_alias_multiplatform(
name = extra_pkg,
repo_mapping = pkg_aliases,
target_suffix = "{}__{}".format(normalized, extra),
)
# Also create pkg__ alias (no extras)
no_extras_pkg = "{}__".format(normalized)
files["{}/BUILD.bazel".format(no_extras_pkg)] = _render_extra_alias_multiplatform(
name = no_extras_pkg,
repo_mapping = pkg_aliases,
target_suffix = normalized,
)
| norm_pkg = normalize_name(whl_name) | ||
| for extra_names in extras_info.values(): | ||
| for extra in extra_names: | ||
| alias_name = "%s__%s" % (norm_pkg, extra) |
There was a problem hiding this comment.
There is a naming mismatch between the unified hub alias name and the hub repository package name. The unified hub alias is generated with double underscores __ (e.g., requests__security), whereas render_pkg_aliases.bzl generates the hub repository package with a single underscore _ (e.g., requests_security). This mismatch will cause the unified hub aliases to point to non-existent packages. Update this to use a single underscore to match the hub repository package naming.
| alias_name = "%s__%s" % (norm_pkg, extra) | |
| alias_name = "%s_%s" % (norm_pkg, extra) |
| if not extra_only_deps and not extra_only_deps_select: | ||
| continue |
There was a problem hiding this comment.
Skipping the generation of the pkg__extra target when there are no extra dependencies (or when all extra dependencies are already present in the base dependencies) will cause the hub repository's alias to point to a non-existent target, leading to build failures. We should always generate the pkg__extra target, even if it only depends on ":pkg", to ensure that the hub aliases remain valid.
rickeylev
left a comment
There was a problem hiding this comment.
saw this was draft, so didn't look too thoroughly, just for things that looked like the bot was obviously wrong
| Since this is a refactor for users not using private APIs, the env variable will be switched/removed | ||
| later together with old code. | ||
|
|
||
| All of the code should be written in a TDD style where we add a failing test for a feature and then |
There was a problem hiding this comment.
Interesting idea. It it working ok?
There was a problem hiding this comment.
I find that the tests are sometimes better quality and it helps with the iterative process.
| So the naming convention after the change is: | ||
| `<prefix>_<name>_<py_tag>_<abi_tag>_<platform_tag>_<sha256[:8]>` | ||
|
|
||
| Where `<prefix>` is the hub repository name and the rest of the segments come from the wheel name |
There was a problem hiding this comment.
If we could do away with the hub name in the whl_library name, this would allow different hubs to benefit from the optimization, too. (any progress on de-duplication would be good, though!)
There was a problem hiding this comment.
For this to work we would have to have the dep graph in the bu repo. Not not-doable, but requires the whl_library to not do any depgraph generation. The reasons are 2 fold:
- The hubs may declare cyclic dependency graphs. The handling of that is done at the hub level, not the spoke, but in the spoke we need to exclude them. We could potentially reuse
whl_librarynodes that do not have any cycles. - We are not passing
target_platformsto thewhl_libraryand instead we are passing the packages that the hub contains, so that we can avoidbazel queryerrors. This was done in the hopes thatwhl_librarymay not need to knowtarget_platformsto generate thedeps. In the end, this may be not a valid assumption.
For us to be able to reuse the whl_library in all places we have to:
- Write the
target_platformsinto theunified_hubas some sort ofvenv -> list[target_platform]map. - Load that via
load("unified_hub")in thewhl_libraryand for each value we would generate an appropriatedepsfield. This would happen in a macro, so would not trigger refetching of the repos. - Load cycles per venv from the unified hub repo. This allows us to exclude the deps in the cycle from the
whl_librarycontext. Again, the strategy should be done as above.
I think this is technically doable with the current layout and it could become way faster that way, but I'd like to first untangle the per-venv usage.
| @@ -0,0 +1,131 @@ | |||
| # Single dep for spoke repos per `whl_name`. | |||
There was a problem hiding this comment.
idk how much of this plan is hand-written vs generated.
fwiw, i've had best results with having it generate the plan, then go through a few rounds of review and tweaking.
There was a problem hiding this comment.
I have done the same - write some of the plan and then go a few rounds of review and tweaking, but I did the review via AI - i.e. does it understand what to implement? We'll see how it works. I'll move the file to the .agents dir.
| def render_pkg_aliases(*, aliases, requirement_cycles = None, extra_hub_aliases = {}, **kwargs): | ||
| def _render_extra_alias(*, name, repo, target): | ||
| return """\ | ||
| package(default_visibility = ["//visibility:public"]) |
There was a problem hiding this comment.
this package() line probably doesn't belong here and will cause weird errors
There was a problem hiding this comment.
I'll let AI find this out via TDD :D Thanks for highlighting this.
|
Again, ran out of free tokens, so will postpone for another day. |
remove incorrect code written by AI
453b0b3 to
23c95ab
Compare
This is no longer used starting when we enabled pipstar by default and did a code cleanup where Python is no longer used to extract the wheels. Split from bazel-contrib#3856
Summary:
opencode, continue tomorrow.Fixes #2948
Work towards #2530