Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions .agents/plans/single_dep_whl_library.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Single dep for spoke repos per `whl_name`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk how much of this plan is hand-written vs generated.

fwiw, i've had best results with having it generate the plan, then go through a few rounds of review and tweaking.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done the same - write some of the plan and then go a few rounds of review and tweaking, but I did the review via AI - i.e. does it understand what to implement? We'll see how it works. I'll move the file to the .agents dir.


The goal is to have `whl_library` instances from wheels (where we pass `url` for a URL or we pass
a `whl_file` label which points to an actual wheel) to be reused across different python versions
and across different configurations where different extras are requested. This means that
if the user is using the same wheel in 2 different configurations, the wheel is reused, because
the extracted contents should not differ. This could be important for:
* platform-specific wheels for custom platforms defined by the user. Otherwise the
platform-specific wheel behaviour should remain unchanged.
* cross-platform wheels for are the main affected item here.

The code should be gated by a feature flag, which can be flipped to be enabled using an
environmental variable via

The file to modify `python/private/internal_config_repo.bzl`. Use the
`RULES_PYTHON_WHL_LIBRARY_OPTIMIZED=1` to enable the feature. Add it to the
`docs/readthedocs_build.sh` file so that we are exercising the code. And document this in
the `docs/environment-variables.md` file.
If the flag is off, then it is equivalent to the current
`main` branch behaviour, if it is on, then it is equivalent to turning all of the behaviour
implemented here on.

Since this is a refactor for users not using private APIs, the env variable will be switched/removed
later together with old code.

All of the code should be written in a TDD style where we add a failing test for a feature and then

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. It it working ok?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that the tests are sometimes better quality and it helps with the iterative process.

we implement the new code.

## Change the naming of the spoke repositories:

> The hub repositories should be named `<hub_name>_<wheel_name_suffix>.

The names should be changed:
* from `+pip+dev_pip_314_requests_py3_none_any_2a0d60c1_linux_x86_64_linux_x86_64_freethreaded`
to `+pip+dev_pip_requests_py3_none_any_2a0d60c1`
* from `+pip+dev_pip_314_roman_numerals_py3_none_any_647ba99c`
to `+pip+dev_pip_roman_numerals_py3_none_any_647ba99c`
* from `+pip+dev_pip_314_markupsafe_cp314_cp314t_manylinux_2_17_x86_64_fed51ac4`
to `+pip+dev_pip_markupsafe_cp314_cp314t_manylinux_2_17_x86_64_fed51ac4`

So the naming convention after the change is:
`<prefix>_<name>_<py_tag>_<abi_tag>_<platform_tag>_<sha256[:8]>`

Where `<prefix>` is the hub repository name and the rest of the segments come from the wheel name

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could do away with the hub name in the whl_library name, this would allow different hubs to benefit from the optimization, too. (any progress on de-duplication would be good, though!)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work we would have to have the dep graph in the bu repo. Not not-doable, but requires the whl_library to not do any depgraph generation. The reasons are 2 fold:

  1. The hubs may declare cyclic dependency graphs. The handling of that is done at the hub level, not the spoke, but in the spoke we need to exclude them. We could potentially reuse whl_library nodes that do not have any cycles.
  2. We are not passing target_platforms to the whl_library and instead we are passing the packages that the hub contains, so that we can avoid bazel query errors. This was done in the hopes that whl_library may not need to know target_platforms to generate the deps. In the end, this may be not a valid assumption.

For us to be able to reuse the whl_library in all places we have to:

  • Write the target_platforms into the unified_hub as some sort of venv -> list[target_platform] map.
  • Load that via load("unified_hub") in the whl_library and for each value we would generate an appropriate deps field. This would happen in a macro, so would not trigger refetching of the repos.
  • Load cycles per venv from the unified hub repo. This allows us to exclude the deps in the cycle from the whl_library context. Again, the strategy should be done as above.

I think this is technically doable with the current layout and it could become way faster that way, but I'd like to first untangle the per-venv usage.

itself.

The sdist building `whl_library` instances should change the naming to:
`<prefix>_<name>_<sha256[:8]>_<rules_python_target_platform>`

The file to modify `python/private/pypi/whl_repo_name.bzl`

## Change the `whl_library_targets`

Using the `METADATA` file that is parsed from `whl_metadata` function where the `Provides-Extra` is
retrieved from the file. Right now we generate a single target which includes all of the extras that
are required. Instead do the following target generation:
* `pkg` (no extras) - target with no extras, this is the main target that includes the python sources,
the other targets just include extra dependencies.
* `pkg__extra` - target for each extra in the provides-extra list. If the target depends on
`self[another_extra]`, then explode the nodes so that the target only depends on `pkg` target +
extra dependencies. Use `py_library` for this. Also be smart about generating targets:
- If the only extra is in the env marker, but the dependency is not in the hub repo, then skip
generating the whole `pkg__extra` target.
- Propose any extra ideas here.

Related files:
* `python/private/pypi/whl_library.bzl`
* `python/private/pypi/whl_metadata.bzl`
* `python/private/pypi/whl_library_targets.bzl`

## Change `hub_builder` and `hub_repository`

Instead of passing the `requirement[extra1,extra2]` to the `whl_library`, pass it to the
`hub_repository` via `whl_config_setting` so that the `render_pkg_aliases` is using the information to alias to the right
extra target based on what is requested. This should retain the behaviour where different target
platforms are allowed to target `foo[baz]` and `foo[bar]` and we select the
`@dev_pip_spoke_repo//:pkg__baz` and `@dev_pip_spoke_repo//:pkg__bar` appropriately.

We should also generate extra targets here:
* `pkg` should point to the target with all specified extras as passed to the `hub_repository`. This is to keep backwards compatibility with how it used to be done
previously. This also keeps compatibility with `rules_pycross`. Add this as a comment
when doing this. The spoke repos should not be used directly, so this difference in
behaviour is OK.
This reuses the targets declared in the hub repository described below.
It is backed by a `py_library` target if there are multiple extras, or an `alias` if it
is only a single extra. It is the same target as described below.
* `pkg[]` should point to the target in the spoke without extras
* `pkg[extra]` should point to the target in the spoke with particular extras. Do this for each
provided extra. So for `requirements[extra1,extra2]` passed to the hub repo, 2 extra targets will
be created. If the extras have not been specified for certain platforms via the
`whl_config_setting`, then the `select` statement should raise an `error` for no match.
Document the mapping explicitly: hub `pkg[]` → spoke `pkg`, hub `pkg[extra]` → spoke `pkg__extra`.
The fact that some of the targets in the spoke repos are unreachable is intentional - this is
because the hub repository contents are created before the `whl_library` downloads the whl and
inspects the METADATA.
The reverse direction is not a concern because the requirements file locking results in
consistent file. If something like this happens, then we should provide a message to the
user that something went wrong and that they should create a ticket in rules_python bug
tracker.
* If there is `requirement[extra1,extra2]` passed, that means that we should create a special alias
that includes both of the targets at once. For this use `py_library` instead of `alias` and pass
`:pkg[extra1]` and `:pkg[extra2]` to the `deps` of the `py_library`. This won't cause
a failure because for a particular platform that is requested this should work by
construction, because the dependency targets will be also defined for the particular platform.

* If the `requirement` is passed without any extras, then hub `pkg` should alias to `pkg[]`.
* If the target is `py_library`, then we should name it without `[]` to avoid any aspects traversing
`py_library` targets changing behaviour. Only alias targets can have `[]`.

Related files:
* `python/private/pypi/render_pkg_aliases.bzl`
* `python/private/pypi/hub_builder.bzl`
* `python/private/pypi/hub_repository.bzl`
* `python/private/pypi/whl_config_setting.bzl` - consider extending this struct to specify which
extras are requested for which platform.

## Modify `pip_repository`

Do similar changes to the `WORKSPACE` code to ensure easier maintenance of `whl_library` so that all
code paths to `whl_library` remain consistent.

Implement this by using the information that we retrieve by parsing the `requirements.txt` files
using the `parse_requirements` function. Parse the requirement itself using the existing parser in
the referenced file `pep508_requirement.bzl`.

Related files:
* `python/private/pypi/pip_repository.bzl`
* `python/private/pypi/requirements.bzl.tmpl.workspace`
* `python/private/pypi/pep508_requirement.bzl`

## Modify `unified_hub_repo`

Pass the extras to the `unified_hub_repo` as well so that the extra targets are created. The target
topology in the unified hub repo should correspond to the hub_repository.

File:
* `python/private/pypi/unified_hub_repo.bzl`
11 changes: 11 additions & 0 deletions .agents/skills/accessing-sandbox/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Accessing build failures

When looking at the test results and the build sandbox, do not ask permission to access the paths
via the absolute path, instead access them via the convenience symlinks in the root of the git
repository.

Patterns:
* Use `bazel-bin/tests/pypi/hub_builder` to access the test results for the `//tests/pypi/hub_builder`
* Use `bazel-rules_python/external/` to access the external repos fetch or materialized by the
extensions
* Use `bazel-testlogs/tests/pypi/hub_builder/test_simple` to access the logs for the `//tests/pypi/hub_builder:test_simple` test.
3 changes: 3 additions & 0 deletions docs/readthedocs_build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env bash

# Exercising the whl_library optimized mode to ensure it is working
export RULES_PYTHON_WHL_LIBRARY_OPTIMIZED=1

set -eou pipefail

declare -a extra_env
Expand Down
5 changes: 5 additions & 0 deletions python/private/internal_config_repo.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,17 @@ load(":repo_utils.bzl", "repo_utils")
_ENABLE_DEPRECATION_WARNINGS_ENVVAR_NAME = "RULES_PYTHON_DEPRECATION_WARNINGS"
_ENABLE_DEPRECATION_WARNINGS_DEFAULT = "0"

_WHL_LIBRARY_OPTIMIZED_ENVVAR_NAME = "RULES_PYTHON_WHL_LIBRARY_OPTIMIZED"
_WHL_LIBRARY_OPTIMIZED_DEFAULT = "0"

_CONFIG_TEMPLATE = """
config = struct(
build_python_zip_default = {build_python_zip_default},
supports_whl_extraction = {supports_whl_extraction},
enable_pystar = True,
enable_deprecation_warnings = {enable_deprecation_warnings},
extract_needs_chmod = {extract_needs_chmod},
whl_library_optimized = {whl_library_optimized},
bazel_8_or_later = {bazel_8_or_later},
bazel_9_or_later = {bazel_9_or_later},
bazel_10_or_later = {bazel_10_or_later},
Expand Down Expand Up @@ -104,6 +108,7 @@ def _internal_config_repo_impl(rctx):
rctx.file("rules_python_config.bzl", _CONFIG_TEMPLATE.format(
build_python_zip_default = repo_utils.get_platforms_os_name(rctx) == "windows",
enable_deprecation_warnings = _bool_from_environ(rctx, _ENABLE_DEPRECATION_WARNINGS_ENVVAR_NAME, _ENABLE_DEPRECATION_WARNINGS_DEFAULT),
whl_library_optimized = _bool_from_environ(rctx, _WHL_LIBRARY_OPTIMIZED_ENVVAR_NAME, _WHL_LIBRARY_OPTIMIZED_DEFAULT),
builtin_py_info_symbol = builtin_py_info_symbol,
builtin_py_runtime_info_symbol = builtin_py_runtime_info_symbol,
supports_whl_extraction = str(supports_whl_extraction),
Expand Down
1 change: 1 addition & 0 deletions python/private/pypi/generate_whl_library_build_bazel.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ _RENDER = {
"extras": render.list,
"group_deps": render.list,
"include": str,
"provides_extra": render.list,
"requires_dist": render.list,
"srcs_exclude": render.list,
"tags": render.list,
Expand Down
37 changes: 26 additions & 11 deletions python/private/pypi/hub_builder.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ load(":attrs.bzl", "use_isolated")
load(":parse_requirements.bzl", "parse_requirements")
load(":pep508_env.bzl", "env")
load(":pep508_evaluate.bzl", "evaluate")
load(":pep508_requirement.bzl", "requirement")
load(":python_tag.bzl", "python_tag")
load(":requirements_files_by_platform.bzl", "requirements_files_by_platform")
load(":whl_config_setting.bzl", "whl_config_setting")
Expand Down Expand Up @@ -308,13 +309,10 @@ def _add_whl_library(self, *, python_version, whl, repo):
# disallow building from sdist.
return

# TODO @aignas 2025-06-29: we should not need the version in the repo_name if
# we are using pipstar and we are downloading the wheel using the downloader
#
# However, for that we should first have a different way to reference closures with
# extras. For example, if some package depends on `foo[extra]` and another depends on
# `foo`, we should have 2 py_library targets.
repo_name = "{}_{}_{}".format(self.name, version_label(python_version), repo.repo_name)
if self._config.whl_library_optimized:
repo_name = "{}_{}".format(self.name, repo.repo_name)
else:
repo_name = "{}_{}_{}".format(self.name, version_label(python_version), repo.repo_name)

if repo_name in self._whl_libraries:
diff = _diff_dict(self._whl_libraries[repo_name], repo.args)
Expand Down Expand Up @@ -538,6 +536,7 @@ def _create_whl_repos(
)
for src in whl.srcs:
repo = _whl_repo(
self,
src = src,
index_url = whl.index_url,
whl_library_args = whl_library_args,
Expand All @@ -549,6 +548,7 @@ def _create_whl_repos(
is_multiple_versions = whl.is_multiple_versions,
interpreter = interpreter,
enable_pipstar_extract = enable_pipstar_extract,
optimized = self._config.whl_library_optimized,
)
_add_whl_library(
self,
Expand Down Expand Up @@ -612,6 +612,7 @@ def _whl_library_args(self, *, whl, whl_modifications):
return whl_library_args

def _whl_repo(
self,
*,
src,
whl_library_args,
Expand All @@ -623,9 +624,21 @@ def _whl_repo(
python_version,
use_downloader,
interpreter,
enable_pipstar_extract = False):
enable_pipstar_extract = False,
optimized = False):
args = dict(whl_library_args)
args["requirement"] = src.requirement_line
if self._config.whl_library_optimized:
line = requirement(src.requirement_line)
extras = line.extras

# Strip extras from the requirement for whls in optimized mode so that
# the wheel can be reused across different configurations requesting
# different extras.
args["requirement"] = line.name
else:
args["requirement"] = src.requirement_line
extras = []

is_whl = src.filename.endswith(".whl")

if src.extra_pip_args and not is_whl:
Expand Down Expand Up @@ -658,6 +671,7 @@ def _whl_repo(
config_setting = whl_config_setting(
version = python_version,
target_platforms = target_platforms or None,
extras = extras,
),
)

Expand All @@ -678,14 +692,15 @@ def _whl_repo(
# TODO @aignas 2025-11-02: once we have pipstar enabled we can add extra
# targets to each hub for each extra combination and solve this more cleanly as opposed to
# duplicating whl_library repositories.
target_platforms = src.target_platforms if is_multiple_versions else []
target_platforms = src.target_platforms if (is_multiple_versions and not optimized) else []

return struct(
repo_name = whl_repo_name(src.filename, src.sha256, *target_platforms),
repo_name = whl_repo_name(src.filename, src.sha256, target_platforms = target_platforms, optimized = optimized),
args = args,
config_setting = whl_config_setting(
version = python_version,
target_platforms = src.target_platforms,
extras = extras,
),
)

Expand Down
2 changes: 2 additions & 0 deletions python/private/pypi/hub_repository.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -146,4 +146,6 @@ def _whl_config_setting_dict(a):
ret["target_platforms"] = a.target_platforms
if a.version:
ret["version"] = a.version
if a.extras:
ret["extras"] = a.extras
return ret
5 changes: 4 additions & 1 deletion python/private/pypi/whl_config_setting.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

"A small function to create an alias for a whl distribution"

def whl_config_setting(*, version = None, target_platforms = None):
def whl_config_setting(*, version = None, target_platforms = None, extras = None):
"""The bzl_packages value used by by the render_pkg_aliases function.

This contains the minimum amount of information required to generate correct
Expand All @@ -27,6 +27,8 @@ def whl_config_setting(*, version = None, target_platforms = None):
is no match found during a select.
target_platforms: {type}`list[str] | None` the list of target_platforms for this
distribution.
extras: {type}`list[str] | None` the list of extras for this particular target platform to
enable.

Returns:
a struct with the validated and parsed values.
Expand Down Expand Up @@ -55,4 +57,5 @@ def whl_config_setting(*, version = None, target_platforms = None):
# Make the struct hashable
target_platforms = tuple(target_platforms) if target_platforms else None,
version = version,
extras = extras or None,
)
Loading
Loading