-
-
Notifications
You must be signed in to change notification settings - Fork 699
refactor(pypi): reuse the same whl_library instances #3856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| # Single dep for spoke repos per `whl_name`. | ||
|
|
||
| The goal is to have `whl_library` instances from wheels (where we pass `url` for a URL or we pass | ||
| a `whl_file` label which points to an actual wheel) to be reused across different python versions | ||
| and across different configurations where different extras are requested. This means that | ||
| if the user is using the same wheel in 2 different configurations, the wheel is reused, because | ||
| the extracted contents should not differ. This could be important for: | ||
| * platform-specific wheels for custom platforms defined by the user. Otherwise the | ||
| platform-specific wheel behaviour should remain unchanged. | ||
| * cross-platform wheels for are the main affected item here. | ||
|
|
||
| The code should be gated by a feature flag, which can be flipped to be enabled using an | ||
| environmental variable via | ||
|
|
||
| The file to modify `python/private/internal_config_repo.bzl`. Use the | ||
| `RULES_PYTHON_WHL_LIBRARY_OPTIMIZED=1` to enable the feature. Add it to the | ||
| `docs/readthedocs_build.sh` file so that we are exercising the code. And document this in | ||
| the `docs/environment-variables.md` file. | ||
| If the flag is off, then it is equivalent to the current | ||
| `main` branch behaviour, if it is on, then it is equivalent to turning all of the behaviour | ||
| implemented here on. | ||
|
|
||
| Since this is a refactor for users not using private APIs, the env variable will be switched/removed | ||
| later together with old code. | ||
|
|
||
| All of the code should be written in a TDD style where we add a failing test for a feature and then | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting idea. It it working ok?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find that the tests are sometimes better quality and it helps with the iterative process. |
||
| we implement the new code. | ||
|
|
||
| ## Change the naming of the spoke repositories: | ||
|
|
||
| > The hub repositories should be named `<hub_name>_<wheel_name_suffix>. | ||
|
|
||
| The names should be changed: | ||
| * from `+pip+dev_pip_314_requests_py3_none_any_2a0d60c1_linux_x86_64_linux_x86_64_freethreaded` | ||
| to `+pip+dev_pip_requests_py3_none_any_2a0d60c1` | ||
| * from `+pip+dev_pip_314_roman_numerals_py3_none_any_647ba99c` | ||
| to `+pip+dev_pip_roman_numerals_py3_none_any_647ba99c` | ||
| * from `+pip+dev_pip_314_markupsafe_cp314_cp314t_manylinux_2_17_x86_64_fed51ac4` | ||
| to `+pip+dev_pip_markupsafe_cp314_cp314t_manylinux_2_17_x86_64_fed51ac4` | ||
|
|
||
| So the naming convention after the change is: | ||
| `<prefix>_<name>_<py_tag>_<abi_tag>_<platform_tag>_<sha256[:8]>` | ||
|
|
||
| Where `<prefix>` is the hub repository name and the rest of the segments come from the wheel name | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we could do away with the hub name in the whl_library name, this would allow different hubs to benefit from the optimization, too. (any progress on de-duplication would be good, though!)
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this to work we would have to have the dep graph in the bu repo. Not not-doable, but requires the
For us to be able to reuse the
I think this is technically doable with the current layout and it could become way faster that way, but I'd like to first untangle the per-venv usage. |
||
| itself. | ||
|
|
||
| The sdist building `whl_library` instances should change the naming to: | ||
| `<prefix>_<name>_<sha256[:8]>_<rules_python_target_platform>` | ||
|
|
||
| The file to modify `python/private/pypi/whl_repo_name.bzl` | ||
|
|
||
| ## Change the `whl_library_targets` | ||
|
|
||
| Using the `METADATA` file that is parsed from `whl_metadata` function where the `Provides-Extra` is | ||
| retrieved from the file. Right now we generate a single target which includes all of the extras that | ||
| are required. Instead do the following target generation: | ||
| * `pkg` (no extras) - target with no extras, this is the main target that includes the python sources, | ||
| the other targets just include extra dependencies. | ||
| * `pkg__extra` - target for each extra in the provides-extra list. If the target depends on | ||
| `self[another_extra]`, then explode the nodes so that the target only depends on `pkg` target + | ||
| extra dependencies. Use `py_library` for this. Also be smart about generating targets: | ||
| - If the only extra is in the env marker, but the dependency is not in the hub repo, then skip | ||
| generating the whole `pkg__extra` target. | ||
| - Propose any extra ideas here. | ||
|
|
||
| Related files: | ||
| * `python/private/pypi/whl_library.bzl` | ||
| * `python/private/pypi/whl_metadata.bzl` | ||
| * `python/private/pypi/whl_library_targets.bzl` | ||
|
|
||
| ## Change `hub_builder` and `hub_repository` | ||
|
|
||
| Instead of passing the `requirement[extra1,extra2]` to the `whl_library`, pass it to the | ||
| `hub_repository` via `whl_config_setting` so that the `render_pkg_aliases` is using the information to alias to the right | ||
| extra target based on what is requested. This should retain the behaviour where different target | ||
| platforms are allowed to target `foo[baz]` and `foo[bar]` and we select the | ||
| `@dev_pip_spoke_repo//:pkg__baz` and `@dev_pip_spoke_repo//:pkg__bar` appropriately. | ||
|
|
||
| We should also generate extra targets here: | ||
| * `pkg` should point to the target with all specified extras as passed to the `hub_repository`. This is to keep backwards compatibility with how it used to be done | ||
| previously. This also keeps compatibility with `rules_pycross`. Add this as a comment | ||
| when doing this. The spoke repos should not be used directly, so this difference in | ||
| behaviour is OK. | ||
| This reuses the targets declared in the hub repository described below. | ||
| It is backed by a `py_library` target if there are multiple extras, or an `alias` if it | ||
| is only a single extra. It is the same target as described below. | ||
| * `pkg[]` should point to the target in the spoke without extras | ||
| * `pkg[extra]` should point to the target in the spoke with particular extras. Do this for each | ||
| provided extra. So for `requirements[extra1,extra2]` passed to the hub repo, 2 extra targets will | ||
| be created. If the extras have not been specified for certain platforms via the | ||
| `whl_config_setting`, then the `select` statement should raise an `error` for no match. | ||
| Document the mapping explicitly: hub `pkg[]` → spoke `pkg`, hub `pkg[extra]` → spoke `pkg__extra`. | ||
| The fact that some of the targets in the spoke repos are unreachable is intentional - this is | ||
| because the hub repository contents are created before the `whl_library` downloads the whl and | ||
| inspects the METADATA. | ||
| The reverse direction is not a concern because the requirements file locking results in | ||
| consistent file. If something like this happens, then we should provide a message to the | ||
| user that something went wrong and that they should create a ticket in rules_python bug | ||
| tracker. | ||
| * If there is `requirement[extra1,extra2]` passed, that means that we should create a special alias | ||
| that includes both of the targets at once. For this use `py_library` instead of `alias` and pass | ||
| `:pkg[extra1]` and `:pkg[extra2]` to the `deps` of the `py_library`. This won't cause | ||
| a failure because for a particular platform that is requested this should work by | ||
| construction, because the dependency targets will be also defined for the particular platform. | ||
|
|
||
| * If the `requirement` is passed without any extras, then hub `pkg` should alias to `pkg[]`. | ||
| * If the target is `py_library`, then we should name it without `[]` to avoid any aspects traversing | ||
| `py_library` targets changing behaviour. Only alias targets can have `[]`. | ||
|
|
||
| Related files: | ||
| * `python/private/pypi/render_pkg_aliases.bzl` | ||
| * `python/private/pypi/hub_builder.bzl` | ||
| * `python/private/pypi/hub_repository.bzl` | ||
| * `python/private/pypi/whl_config_setting.bzl` - consider extending this struct to specify which | ||
| extras are requested for which platform. | ||
|
|
||
| ## Modify `pip_repository` | ||
|
|
||
| Do similar changes to the `WORKSPACE` code to ensure easier maintenance of `whl_library` so that all | ||
| code paths to `whl_library` remain consistent. | ||
|
|
||
| Implement this by using the information that we retrieve by parsing the `requirements.txt` files | ||
| using the `parse_requirements` function. Parse the requirement itself using the existing parser in | ||
| the referenced file `pep508_requirement.bzl`. | ||
|
|
||
| Related files: | ||
| * `python/private/pypi/pip_repository.bzl` | ||
| * `python/private/pypi/requirements.bzl.tmpl.workspace` | ||
| * `python/private/pypi/pep508_requirement.bzl` | ||
|
|
||
| ## Modify `unified_hub_repo` | ||
|
|
||
| Pass the extras to the `unified_hub_repo` as well so that the extra targets are created. The target | ||
| topology in the unified hub repo should correspond to the hub_repository. | ||
|
|
||
| File: | ||
| * `python/private/pypi/unified_hub_repo.bzl` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # Accessing build failures | ||
|
|
||
| When looking at the test results and the build sandbox, do not ask permission to access the paths | ||
| via the absolute path, instead access them via the convenience symlinks in the root of the git | ||
| repository. | ||
|
|
||
| Patterns: | ||
| * Use `bazel-bin/tests/pypi/hub_builder` to access the test results for the `//tests/pypi/hub_builder` | ||
| * Use `bazel-rules_python/external/` to access the external repos fetch or materialized by the | ||
| extensions | ||
| * Use `bazel-testlogs/tests/pypi/hub_builder/test_simple` to access the logs for the `//tests/pypi/hub_builder:test_simple` test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk how much of this plan is hand-written vs generated.
fwiw, i've had best results with having it generate the plan, then go through a few rounds of review and tweaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done the same - write some of the plan and then go a few rounds of review and tweaking, but I did the review via AI - i.e. does it understand what to implement? We'll see how it works. I'll move the file to the
.agentsdir.