Skip to content

Commit 4253490

Browse files
authored
Speed up cache saving (#1902)
At the end of the build, the Python installation including the site-packages directory (which contains the app's dependencies) has to be copied from the build directory to the cache directory, so that it's cached for the next build. For apps with very large dependencies this can take a considerable amount of time (Honeycomb reports `python.cache_save_duration` as having a P99.9 of 28 seconds). I'd previously benchmarked using `cp --reflink=auto` in the hopes of improving this in a way that would be compatible with all configurations (including Heroku CI and `build-in-app-dir`), but found it made performance worse for standard builds with the current production build system's filesystems and mount configurations. This leaves `cp --link` (which uses hardlinks) as the last way we can improve performance - but using this means: 1. We need to check whether the cache and build directory are on the same filesystem mounts (since otherwise the copy will fail with a `Invalid cross-device link` error). 2. Any modifications made to the files in one location affects the other. (eg by later buildpacks) However, given the Honeycomb data shows the time spent saving the cache really is significant for some apps, it's clear that using `--link` is worth the complexity/mutability trade-offs. In addition, Python package managers typically fully uninstall a package before reinstalling it at a different version - so if any apps perform `pip install` type operations in later buildpacks (which isn't something we officially support anyway), the cache should still be left in a valid state. Note: We use `df` to determine whether both locations are on the same filesystem mount since `stat`'s device number reports the same value for separate filesystems that happen to be mounted on the same backing device. See: - https://manpages.ubuntu.com/manpages/noble/en/man1/cp.1.html - https://manpages.ubuntu.com/manpages/noble/en/man1/df.1.html GUS-W-19603153.
1 parent 04d8c0d commit 4253490

2 files changed

Lines changed: 19 additions & 7 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
- Updated Poetry from 2.1.4 to 2.2.0. ([#1900](https://github.com/heroku/heroku-buildpack-python/pull/1900))
66
- Updated uv from 0.8.15 to 0.8.18. ([#1899](https://github.com/heroku/heroku-buildpack-python/pull/1899) and [#1901](https://github.com/heroku/heroku-buildpack-python/pull/1901))
7+
- Improved performance of Python build cache saving. ([#1902](https://github.com/heroku/heroku-buildpack-python/pull/1902))
78

89
## [v307] - 2025-09-10
910

lib/cache.sh

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -171,14 +171,25 @@ function cache::save() {
171171
output::step "Saving cache"
172172

173173
mkdir -p "${cache_dir}/.heroku"
174-
175174
rm -rf "${cache_dir}/.heroku/python"
176-
# In theory we should be able to use `--reflink=auto` here for improved performance, however,
177-
# initial benchmarking showed it to be slower with the file system type / mounts used by the
178-
# Heroku build system for some reason. (Copying was faster using `--link`, however, that fails
179-
# when copying cross-mount such as for Heroku CI and build-in-app-dir, plus hardlinks could
180-
# result in unintended cache mutation if later buildpacks add/remove packages etc.)
181-
cp --recursive "${build_dir}/.heroku/python" "${cache_dir}/.heroku/"
175+
176+
local build_dir_filesystem cache_dir_filesystem
177+
build_dir_filesystem="$(df --output=target "${build_dir}")"
178+
cache_dir_filesystem="$(df --output=target "${cache_dir}")"
179+
180+
# For improved performance, we copy using hard-links if possible. This requires that the build
181+
# and cache directory are on the same filesystem mount - which is the case for standard builds
182+
# but not Heroku CI or build-in-app-dir. Ideally we would be able to use `--reflink=auto` here
183+
# (which would avoid the need for a conditional and also mean accidental edits by users in later
184+
# buildpacks to one location doesn't affect the other), however, with the current filesystems
185+
# used in production benchmarking showed `--reflinks=auto` was much slower than hardlinks.
186+
if [[ "${build_dir_filesystem}" == "${cache_dir_filesystem}" ]]; then
187+
local additional_copy_args=(--link)
188+
else
189+
local additional_copy_args=()
190+
fi
191+
192+
cp --recursive "${additional_copy_args[@]}" "${build_dir}/.heroku/python" "${cache_dir}/.heroku/"
182193

183194
# Metadata used by subsequent builds to determine whether the cache can be reused.
184195
# These are written/consumed via separate files and not the build data store for compatibility

0 commit comments

Comments
 (0)