perf!: add explicit layout of externally provided ids#224
Conversation
|
I found this to be beneficial to performance, since it cuts out the hashing / rehashing overhead. The code I'm working with creates a new solvable for every package in an RPM repository, so solvables are assigned sequentially, and the RPM ecosystem has a handful of use cases that would necessarily touch a large fraction of the solvables in the pool, e.g. full-distro upgrade or repository dependency-closed-ness testing. So in those cases I would imagine this would reduce churn even more significantly than the few simple test cases I've done where it was ~5% overhead. It's hard for me to judge just how beneficial or what a representative test case looks like at the moment though, too much is in flux. |
|
I can run a benchmark tomorrow. I prefer to use a marker type on the dependency provider though. features are transitively enabled which means that if your implementation doesnt enable the feature but a transitive dependency does, the feature is also enabled for your implementation. That would cause serious issues. I can also give that a go if its unclear. |
|
Alright. I hope to get patches posted for resolvo-rpm soon, I'll submit PRs upstream for review. |
|
I benchmarked your code, and it shows a clear, substantial win of ~12%, and no regressions:
Would you be OK if I pushed some changes to make this a property of the Something along the lines of: impl DependencyProvider for MyProvider {
type SolvableStorage = resolvo::solvable_ids::Sparse; // or ::Dense
// ...
}Or do you want to tackle this yourself? |
|
@baszalmstra I just pushed, let me know if this is what you intended. I can squash the commits once you're confident in the change. Claude helped with a first pass, which looks mostly OK.. I tweaked a few things, and a couple of types that were previously declared |
|
I had started before your comment, so here's another commit that uses your suggested naming
That's good to hear, it would be interesting to figure out exactly where the boundary is w/r/t the conditions in which one or the other strategy is better. With a sufficiently small pool, it's likely that It's a memory / compute tradeoff as well I guess, so TBH it could be faster in nearly all scenarios while still not being the appropriate solution in some. But I don't really have an intuition for where that line would be since I just started working on this stuff. |
|
What's your benchmarking harness? And do you think it would be worthwhile to have micro benchmarks that could be added for e.g. all the various kinds of interning, all the various kinds of resolve-ing, etc.? Or do you have a preference for holistic test cases? |
Yeah, this is almost what I had in mind! Except that I dont want to expose the internals to the public API in any way. Do you mind if I amend you commit with what I had in mind?
The repository contains a benchmark tool in Using the snapshot the benchmark tool creates a (deterministic) random combination of inputs, solves them, and outputs the result to a CSV file. There is a jupyter notebook in the solve-snapshot directory that then produces some statistics and charts. Usually, I also ask claude to give me more insights about why certain cases regressed (if they did). Depending on your machine, it takes a little while to run the entire benchmark. I'm aware that this is not a perfect solution because it's not actually based on real-world cases, but I have found that it comes close enough. That being said, I do not yet have a proper snapshot that also includes loads of conditional dependencies (support for that only just landed in the conda world), so there might still be rough edges there.
This is what we previously did, and it was not great. The reason is that we optimized for a relatively small subset of specific use cases (20ish). Even when we did optimize those usecases, that often came at a cost of some other scenario for which we didnt have a benchmark. Using this random approach has worked much better. It flagged many optimizations that on the surface looked great, but then seriously regressed 2% of other cases. But probably having both would be better. I think the best solution would be to have a large set of real-world use cases that also contains a large variety. And maybe combine that with targeted and random cases. But accompanying niche optimizations with micro benchmarks is always appreciated. |
Go for it. Which naming pattern did you prefer? I can squash or delete the 3rd commit
Cool. Well, when I'm a bit further along with resolvo-rpm I'll try that out. I'm currently redesigning |
|
Should I go ahead and rebase or is that going to cause problems (if you've already started) |
|
Im almost done with the changes :) |
|
(rebased) edit: shit, comment didn't load whatever lol, force-push if you need to |
|
Would you like me to squash this? |
|
So it turns out that we were quietly assuming the dense layout in a few places. I have now replaced all those instances. Please let me know what you think of bf396ec ! Ill now run a new benchmark and toggling both options.
We can squash when the PR is merged. |
| impl ArenaId for DependenciesId { | ||
| fn from_usize(x: usize) -> Self { | ||
| Self(x as u32) | ||
| let raw: u32 = (x + 1).try_into().expect("dependencies id too big"); |
There was a problem hiding this comment.
For clarity's sake it might be better to assign a const ROOT_ID: u32 = 1, and structure this as ROOT_ID + x
Then do the same below
i.e. VariableId::root() would be defined as Self( unsafe { NonZeroU32::new_unchecked(ROOT_ID) })
The documentation makes it clear enough already, but this would be better IMO
| /// Returns `true` if this variable represents the root. | ||
| pub fn is_root(self) -> bool { | ||
| self.0 == 0 | ||
| self.0.get() == 1 |
| /// A unique identifier for a variable in the solver. | ||
| /// | ||
| /// Uses a non-zero representation so that `Option<VariableId>` is the same | ||
| /// size as `VariableId`. |
There was a problem hiding this comment.
I presume this makes the "dense" vec even denser? How was the performance?
| /// this hint purely as an optimization; correctness is unaffected. | ||
| /// | ||
| /// Use [`solvable_id::Dense`] when `SolvableId`s are allocated | ||
| /// contiguously from 0 (the common case with a single pool). If you are |
There was a problem hiding this comment.
I suppose "from 0" is not strictly true any longer.
edit: well, for variables, but this is solvables, so not the same
| // `SolverState<{ D::DENSE_SOLVABLE_IDS }>` internally. | ||
| type SolvableStorage: SolvableStorage; | ||
| /// Describes how this provider allocates [`SolvableId`]s. The solver uses | ||
| /// this hint purely as an optimization; correctness is unaffected. |
There was a problem hiding this comment.
"hint" is maybe not quite accurate, since it authoritatively selects the strategy. This could imply that the solver could choose otherwise at runtime, which is not the case.
|
|
||
| impl DependencyProvider for SnapshotProvider<'_> { | ||
| type SolvableStorage = SparseSolvableStorage; | ||
| type SolvableIdLayout = solvable_id::Sparse; |
There was a problem hiding this comment.
solvable_id::Sparse / solvable_id::Dense is a bit awkward since the import path does not connote any sense that the type is a layout option, it seems more like a type of SolvableId. On the other hand, resolvo::solvable_id::layout::Dense is pretty verbose.
I don't particularly object to this naming, just pointing it out.
There was a problem hiding this comment.
It would be nice to just have Sparse and Dense as enum variants provided as a const param, but that doesn't look like it will be stable any time soon unfortunately
| } | ||
| } | ||
|
|
||
| pub(crate) struct SparseSet<K>(HashSet<K>); |
There was a problem hiding this comment.
Are SparseSet and DenseSet meant to actually be used somewhere, or is it more future-planning?
There was a problem hiding this comment.
There was a problem hiding this comment.
The type proliferation does make the code more difficult to wrap the head around unfortunately
| } | ||
| } | ||
|
|
||
| pub(crate) struct DenseSet<K>(IndexedSet<K>); |
There was a problem hiding this comment.
Are these type wrappers actually necessary? Could this be acheived with IndexedSet directly? (which could be renamed)
|
Yeah I think this could be better. Looking back this approach doesnt feel right either. Maybe we should make SolvableId a generic type instead. 🤔 |
bf396ec to
4b561e5
Compare
When enabled, stores solvable_to_variable as a Vec in VariableMap. Both SolvableId and VariableId are sequential arena IDs (u32), so HashMap lookups can be replaced with direct Vec indexing. This eliminates hashing overhead in intern_solvable and origin() lookups, and removes the HashMap insert/rehash costs. This shouldn't necessarily be the default, because solvable ID assignment is controlled by the user and is not guaranteed to be dense. Assisted-By: Claude Opus 4.6
Each DependencyProvider now declares its solvable ID density via an
associated type, controlling the data structure used for
solvable-to-variable mapping in the solver:
type SolvableMap: SolvableMap;
Two implementations are provided:
- DenseSolvableMap: Vec-based O(1) lookup, best when SolvableIds are
sequential from 0 (common with a single pool)
- SparseSolvableMap: HashMap-based, efficient when the pool is large
but only a fraction of solvables are visited during solving
Solvable ID assignment is controlled by the user and is not guaranteed
to be dense, so this cannot be a one-size-fits-all default. The
associated type lets each provider choose the right tradeoff.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SolvableStorage better communicates that this is a storage strategy choice, not a map API. The Dense/Sparse variants read naturally as SolvableStorage::Dense and SolvableStorage::Sparse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the public SolvableStorage trait + DenseSolvableStorage /
SparseSolvableStorage impls with a sealed-marker design that doesn't
leak internal types.
Public surface:
- solvable_id::{Layout, Sparse, Dense} markers
- DependencyProvider::SolvableIdLayout: solvable_id::Layout
Layout is sealed via a pub(crate) Sealed supertrait in the same
module, so external crates cannot implement it.
Internal layer (GAT-based):
- Sealed::Map<V: Copy>: SolvableMap with Sparse/Dense impls
- Sealed::Set<K: ArenaId + Eq + Hash>: SolvableSet with Sparse/Dense impls
- Frozen<T> wrapper exposes &self interior-mut access to a SolvableMap,
used in place of FrozenCopyMap for solvable-keyed caches
Niche-optimised ids:
- VariableId is now NonZeroU32 with index+1 encoding
- DependenciesId is now NonZeroU32 with index+1 encoding
- Halves Option<...> storage in DenseMap<VariableId/DependenciesId>
Layout-aware fields:
- VariableMap::solvable_to_variable
- SolverCache::solvable_to_dependencies
- SolverCache::hint_dependencies_available
- SolverState::clauses_added_for_solvable
Test provider switched to Dense so the integration suite exercises
the dense codepath. Added size_of and roundtrip tests to lock in the
niche optimisations.
4b561e5 to
969a3b6
Compare
969a3b6 to
0ca6a0b
Compare
|
Oke, gave it another go with another design. It all became a little bigger than I hoped, but I think this makes a lot more sense. Let me know what you think! Also rerunning benchmarks as we speak. |
| /// The synthetic solver root. | ||
| Root, | ||
| /// A provider-owned solvable ID. | ||
| Solvable(S), |
There was a problem hiding this comment.
Do you happen to know if this can take advantage of the same optimization as Option<T> when it comes to NonZeroU32? Or is that compiler-specific to Option?
There was a problem hiding this comment.
I had the same question, according to my clanker the niche optimization also applies here if S allows it.
| /// Interior-mutable wrapper around an [`IdMap`]. | ||
| /// | ||
| /// This keeps the same single-threaded, no-reference-returning aliasing model | ||
| /// as the solver's previous specialized frozen storage. |
There was a problem hiding this comment.
Is this a claude-ism? What the solver previously did isn't relevant, should just be a description of what it currently does
| /// Returns the variable id representing the root of the decision tree. | ||
| pub fn root() -> Self { | ||
| // SAFETY: 1 is non-zero. | ||
| Self(unsafe { NonZeroU32::new_unchecked(1) }) |
There was a problem hiding this comment.
Same suggestion as before: 1 should be a named constant ROOT_ID instead of just a free constant sprinkled across multiple functions
| //! These traits are intentionally narrower than general-purpose maps and sets. | ||
| //! They model small pieces of solver state keyed by provider-owned IDs. The | ||
| //! storage implementation is selected by the ID type, so dense IDs can keep | ||
| //! using vector/bitset-backed storage while sparse IDs use hash-backed storage. |
There was a problem hiding this comment.
You would know better than I would what the benefits are, but this particular bit is not particularly simpler at least
The genericization of `Clause<N>` and surrounding types in PR prefix-dev#224 made rustc more conservative about cross-crate inlining of these small leaf methods. Restore inlining explicitly on: - `Literal::new`/`negate`/`satisfying_value`/`from_index`/`to_index`, `VariableId::positive`/`negative`, `Clause::is_binary` (clause.rs) - `DecisionMap::reset`/`set`/`level` (decision_map.rs) - `DecisionTracker::map`/`try_add_decision`/`next_unpropagated` (decision_tracker.rs) - `VariableMap::origin` (variable_map.rs) - `IndexedSet::insert`/`contains` (indexed_set.rs) - `WatchMap::cursor` and 6 `WatchMapCursor` accessors (watch_map.rs) All are leaf functions on the BCP loop, propagation entry, and conflict analysis. No behavior change.
The inner BCP loop walks a linked list of clauses watching a literal, following an arbitrary pointer chain that is essentially random with respect to cache locality. Each step waits ~hundreds of cycles for the next `WatchedLiterals` line to land in L1. `WatchMapCursor::prefetch_next` issues a `_MM_HINT_T0` software prefetch on x86_64 (no-op fallback / volatile read on aarch64). Called once per iteration of the propagate loop after taking the current watched literals, it overlaps memory latency with the compute on the current clause. Pure CPU hint, no observable side-effects.
The `requirement_to_sorted_candidates` lookup is on the BCP hot path, called for every Requires clause as part of `try_fold_literals` / `visit_literals` / `next_unwatched_literal`. `FrozenMap` adds a hash + RefCell + interior allocation per lookup. `RequirementMap` splits the keyspace on the `Requirement` discriminant and stores each variant in its own `Mapping`. Lookup becomes a branch on the enum tag + a direct chunked-vec index — no hashing, no RefCell. The inner ID types (`VersionSetId`, `VersionSetUnionId`) are dense `u32` arena IDs, so the `Mapping` representation is a good fit.
|
Nice! Note that these other optimizations should probably go into another PR so that this one can be squashed more easily (unless you plan on doing that manually). W/ these other ones we probably do not want to just squash them in with the rest. A few other ideas that had at least some improvement locally, but which I have not fully evaluated for their tradeoffs yet: commit c1677a26c11378e9e67bbdcf4444a092d5a5f661
Author: Daniel Alley <dalley@redhat.com>
Date: Wed May 20 23:00:14 2026 -0400
perf: Increase Arena chunk size from 128 to 1024
Reduces the number of heap allocations for large pools. A pool with
70k entries previously allocated ~547 separate Vec chunks; now it
allocates ~68. Minimal memory impact for small pools since only the
first chunk is pre-allocated.
diff --git a/src/internal/arena.rs b/src/internal/arena.rs
index 24ed1e4..d5267ad 100644
--- a/src/internal/arena.rs
+++ b/src/internal/arena.rs
@@ -5,7 +5,7 @@ use std::cmp;
use std::marker::PhantomData;
use std::ops::{Index, IndexMut};
-const CHUNK_SIZE: usize = 128;
+const CHUNK_SIZE: usize = 1024;
/// An `Arena<TValue>` holds a collection of `TValue`s but allocates persistent `TId`s that are used
/// to refer to an element in the arena. When adding an item to an `Arena` it returns a `TId` thatcommit a8a720e8a059ec4e0ba409bc5ab1bd249cfc9ec9
Author: Daniel Alley <dalley@redhat.com>
Date: Wed May 20 23:06:38 2026 -0400
perf: Avoid double hashing on miss in intern_version_set and intern_condition
Add FrozenCopyMap::get_copy_or_insert_with() which uses HashMap::entry()
to hash the key once instead of twice (once for failed get, once for
insert). Applied to intern_version_set and intern_condition which
already have owned keys.
intern_string and intern_package_name still use the two-step pattern
because they benefit from looking up by &str on the hit path to avoid
allocating an owned String.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
diff --git a/src/internal/frozen_copy_map.rs b/src/internal/frozen_copy_map.rs
index ffbefdd..2585b60 100644
--- a/src/internal/frozen_copy_map.rs
+++ b/src/internal/frozen_copy_map.rs
@@ -1,5 +1,6 @@
use std::borrow::Borrow;
use std::cell::UnsafeCell;
+use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::hash::{BuildHasher, Hash};
@@ -27,6 +28,23 @@ impl<K: Eq + Hash, V: Clone, S: BuildHasher> FrozenCopyMap<K, V, S> {
(*map).get(k).cloned()
}
}
+
index ffbefdd..2585b60 100644
--- a/src/internal/frozen_copy_map.rs
+++ b/src/internal/frozen_copy_map.rs
@@ -1,5 +1,6 @@
use std::borrow::Borrow;
use std::cell::UnsafeCell;
+use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::hash::{BuildHasher, Hash};
@@ -27,6 +28,23 @@ impl<K: Eq + Hash, V: Clone, S: BuildHasher> FrozenCopyMap<K, V, S> {
(*map).get(k).cloned()
}
}
+
+ /// Looks up `key` in the map. On hit, returns the existing value. On miss,
+ /// calls `make_val` with a reference to the key (already stored in the
+ /// entry), inserts the result, and returns it. Hashes the key only once.
+ pub fn get_copy_or_insert_with(&self, key: K, make_val: impl FnOnce(&K) -> V) -> V {
+ unsafe {
+ let map = self.map.get();
+ match (*map).entry(key) {
+ Entry::Occupied(e) => e.get().clone(),
+ Entry::Vacant(e) => {
+ let v = make_val(e.key());
+ e.insert(v.clone());
+ v
+ }
+ }
+ }
+ }
}
impl<K: Eq + Hash, V, S: Default> Default for FrozenCopyMap<K, V, S> {
diff --git a/src/utils/pool.rs b/src/utils/pool.rs
index 02e2cae..747e5e0 100644
--- a/src/utils/pool.rs
+++ b/src/utils/pool.rs
@@ -181,13 +181,8 @@ impl<VS: VersionSet, N: PackageName> Pool<VS, N> {
/// is inserted twice they will share the same [`VersionSetId`].
pub fn intern_version_set(&self, package_name: NameId, version_set: VS) -> VersionSetId {
let key = (package_name, version_set);
- if let Some(entry) = self.version_set_to_id.get_copy(&key) {
- entry
- } else {
- let id = self.version_sets.alloc(key.clone());
- self.version_set_to_id.insert_copy(key, id);
- id
- }
+ self.version_set_to_id
+ .get_copy_or_insert_with(key, |k| self.version_sets.alloc(k.clone()))
}
/// Returns the version set associated with the provided id
@@ -240,12 +235,8 @@ impl<VS: VersionSet, N: PackageName> Pool<VS, N> {
/// Conditions are deduplicated, so if the same condition is inserted
/// twice the same `ConditionId` will be returned.
pub fn intern_condition(&self, condition: Condition) -> ConditionId {
- if let Some(id) = self.condition_to_id.get_copy(&condition) {
- return id;
- }
- let id = self.conditions.alloc(condition.clone());
- self.condition_to_id.insert_copy(condition, id);
- id
+ self.condition_to_id
+ .get_copy_or_insert_with(condition, |c| self.conditions.alloc(*c))
}
} |
I was planning on just squash merging this entire PR. Unless you have a strong opinion against this? If you make PRs of those additions, I'll happily run the benchmarks. |
|
It's up to you, I just figured that keeping the context is good, since this goes beyond simply the dense layout change itself. I can make a separate PR with those two. Maybe once this is merged |
|
The new changes look reasonable (to the extent I am able to judge these things, being very new to the codebase), but the previous comments still apply re: changes |
|
@baszalmstra Would you mind documenting this in a little mini README in the
|


When enabled, stores solvable_to_variable as a Vec in VariableMap.
Both SolvableId and VariableId are sequential arena IDs (u32), so HashMap lookups can be replaced with direct Vec indexing. This eliminates hashing overhead in intern_solvable and origin() lookups, and removes the HashMap insert/rehash costs.
This shouldn't necessarily be the default, because solvable ID assignment is controlled by the user and is not guaranteed to be dense.