Skip to content

Computing crate_hash from metadata encoding instead of HIR (implements #94878) #154724

Open
Daniel-B-Smith wants to merge 2 commits into
rust-lang:mainfrom
Daniel-B-Smith:smithdb3/fix-94878
Open

Computing crate_hash from metadata encoding instead of HIR (implements #94878) #154724
Daniel-B-Smith wants to merge 2 commits into
rust-lang:mainfrom
Daniel-B-Smith:smithdb3/fix-94878

Conversation

@Daniel-B-Smith
Copy link
Copy Markdown
Contributor

@Daniel-B-Smith Daniel-B-Smith commented Apr 2, 2026

View all comments

This PR converts the crate_hash/SVH to depend on metadata instead of HIR whenever metadata is needed. A trimmed down crate_hash is kept for dylib/binary cases where metadata is not present. It is believed that the metadata will be a more sound way to track dependency changes.

Related to #94878

The metadata hash is calculated on the raw bytes before flush because hashing the elements incrementally turned out to be too expensive.

The change to the HIR hash is potentially safe even without the metadata crate_hash change. Without that change, this PR is a performance regression. I am bundling them because I believe the HIR hash change is more likely to be safe in light of the HIR hash being less load bearing on the SVH.

The dylib/binary crate_hash removes components that I believe are not relevant to those cases. Analysis:

  • resolutions.visibilities_for_hashing:

The comment says: "Hash visibility information since it does not appear in HIR." Visibilities only matter to external users of the crate (i.e. consumers of metadata). Without metadata, visibility differences are not observable, so this contributes nothing for incremental correctness.

  • debugger_visualizers:

The comment says: "that content is exported into crate metadata, so any changes to it need to be reflected in the crate hash." Without metadata, there is nothing to export, so this is purely metadata-motivated. The visualizer file path is already covered by HIR (the attribute) and the content is loaded later only if metadata is being written.

  • source_file_names:

These exist solely so that the crate-hash reflects remapped source paths that get embedded into metadata (the comment explicitly says "If we included the full mapping in the SVH, we could only have reproducible builds…"). The remapping itself is already captured via dep_tracking_hash, so for incremental this is redundant.

This PR currently forks FileEncoder instead of changing the API of a widely used type. It would not be hard to add a FnMut strategy on flush instead of the fork. Since that would be an unpleasant merge conflict risk, I would request help to get it in quickly. If we stick with the fork, I'll make sure to port the tests. The first commit is solely the file copy, so that the diff is clean in the second commit.

DO NOT SUBMIT: Incremental compilation testing is ongoing and a way to revert the hash is necessary.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 2, 2026
@rust-log-analyzer

This comment has been minimized.

@Daniel-B-Smith Daniel-B-Smith force-pushed the smithdb3/fix-94878 branch 5 times, most recently from 24c04d5 to c826667 Compare April 7, 2026 15:59
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@Daniel-B-Smith Daniel-B-Smith force-pushed the smithdb3/fix-94878 branch 2 times, most recently from 43a7704 to 338aba3 Compare April 8, 2026 19:34
@rust-log-analyzer

This comment has been minimized.

@Daniel-B-Smith Daniel-B-Smith force-pushed the smithdb3/fix-94878 branch 2 times, most recently from 0ddcd96 to d27cca5 Compare April 8, 2026 21:34
@rust-log-analyzer

This comment has been minimized.

@Daniel-B-Smith Daniel-B-Smith force-pushed the smithdb3/fix-94878 branch 3 times, most recently from 1e5f269 to 4171895 Compare April 9, 2026 14:56
@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@Daniel-B-Smith Daniel-B-Smith changed the title #94878 Computing crate_hash from metadata encoding instead of HIR (implements #94878) (very draft) Apr 9, 2026
@nnethercote
Copy link
Copy Markdown
Contributor

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 9, 2026
@rust-log-analyzer

This comment has been minimized.

@mejrs
Copy link
Copy Markdown
Contributor

mejrs commented May 6, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 6, 2026
@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request May 6, 2026
Computing crate_hash from metadata encoding instead of HIR (implements #94878) (very draft)
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented May 6, 2026

☀️ Try build successful (CI)
Build commit: d83f021 (d83f0213d3b2172cfc04784f39a42736048ff7fd, parent: 365c0e1d7a614ca94cb48431dcd2bc6d3b645db1)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (d83f021): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.6%] 7
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 6
Improvements ✅
(primary)
-0.5% [-5.5%, -0.1%] 82
Improvements ✅
(secondary)
-1.4% [-5.1%, -0.1%] 47
All ❌✅ (primary) -0.4% [-5.5%, 0.6%] 89

Max RSS (memory usage)

Results (primary -0.5%, secondary 0.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.8% [1.7%, 1.8%] 2
Regressions ❌
(secondary)
0.9% [0.4%, 2.1%] 6
Improvements ✅
(primary)
-2.0% [-3.6%, -0.7%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.5% [-3.6%, 1.8%] 5

Cycles

Results (primary -4.1%, secondary -2.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.8% [0.5%, 1.1%] 3
Improvements ✅
(primary)
-4.1% [-7.8%, -2.1%] 3
Improvements ✅
(secondary)
-3.3% [-5.3%, -0.7%] 17
All ❌✅ (primary) -4.1% [-7.8%, -2.1%] 3

Binary size

This perf run didn't have relevant results for this metric.

Bootstrap: 501.88s -> 497.481s (-0.88%)
Artifact size: 395.21 MiB -> 395.12 MiB (-0.02%)

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels May 7, 2026
@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-meta Area: Issues & PRs about the rust-lang/rust repository itself A-tidy Area: The tidy tool T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. labels May 7, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 12, 2026

r? @mu001999

rustbot has assigned @mu001999.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: compiler
  • compiler expanded to 73 candidates
  • Random selection from 16 candidates

@mu001999
Copy link
Copy Markdown
Member

r? @cjgillot

Comment on lines 737 to 783
extra_filename: tcx.sess.opts.cg.extra_filename.clone(),
stable_crate_id: tcx.stable_crate_id(LOCAL_CRATE),
required_panic_strategy: tcx.required_panic_strategy(LOCAL_CRATE),
panic_in_drop_strategy: tcx.sess.opts.unstable_opts.panic_in_drop,
edition: tcx.sess.edition(),
has_global_allocator: tcx.has_global_allocator(LOCAL_CRATE),
has_alloc_error_handler: tcx.has_alloc_error_handler(LOCAL_CRATE),
has_panic_handler: tcx.has_panic_handler(LOCAL_CRATE),
has_default_lib_allocator: find_attr!(attrs, DefaultLibAllocator),
externally_implementable_items,
proc_macro_data,
debugger_visualizers,
compiler_builtins: find_attr!(attrs, CompilerBuiltins),
needs_allocator: find_attr!(attrs, NeedsAllocator),
needs_panic_runtime: find_attr!(attrs, NeedsPanicRuntime),
no_builtins: find_attr!(attrs, NoBuiltins),
panic_runtime: find_attr!(attrs, PanicRuntime),
profiler_runtime: find_attr!(attrs, ProfilerRuntime),
symbol_mangling_version: tcx.sess.opts.get_symbol_mangling_version(),

crate_deps,
dylib_dependency_formats,
lib_features,
stability_implications,
lang_items,
diagnostic_items,
lang_items_missing,
stripped_cfg_items,
native_libraries,
foreign_modules,
source_map,
target_modifiers,
denied_partial_mitigations,
traits,
impls,
incoherent_impls,
exportable_items,
stable_order_of_exportable_impls,
exported_non_generic_symbols,
exported_generic_symbols,
interpret_alloc_index,
tables,
syntax_contexts,
expn_data,
expn_hashes,
def_path_hash_map,
specialization_enabled_in: tcx.specialization_enabled_in(LOCAL_CRATE),
Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we include all these in the hash?

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe almost all of these fields are covered either directly (all of the Lazy* fields) or indirectly (via the HIR hash or the dep tracking hash). is_stub isn't covered, but that's not related to the contents of the crate.

It would be annoyingly verbose, but it's completely straightforward to just add hashes for all of these fields. That might allow for removing the dependency on the HIR hash and/or the dep tracking hash. The change will probably be a small performance hit either way.

.encode_enabled_denied_partial_mitigations());

let hash = Svh::new(self.opaque.hash());
tcx.untracked().local_crate_hash.set(hash).expect("local_crate_hash set twice");
Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make crate_hash feedable instead?

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it feedable if we get rid of the crate_hash dependency in the !tcx.needs_metadata() case. See my response on map.rs for more detailed discussion of that.

name: tcx.crate_name(LOCAL_CRATE),
triple: tcx.sess.opts.target_triple.clone(),
hash: tcx.crate_hash(LOCAL_CRATE),
hash,
Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best position to put the hash? Should we put it close to the header, after the start position? This would let us finish encoding everything and make partial decoding easier.

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Everything encoded is already in the hash as finish() is called immediately after this function:

if let Err((path, err)) = ecx.opaque.finish() {

let root = ecx.encode_crate_root();

The partial decoding win would be nice, but my preference would be to punt the metadata format change to a follow up PR regardless.

Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably deserves a better file name.

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about file_encoder.rs?

I also want to flag that we could alternatively change the shared FileEncoder interface instead of forking.

@@ -1,5 +1,6 @@
// tidy-alphabetical-start
#![allow(internal_features)]
#![feature(core_intrinsics)]
Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opaque.rs line 164

.local_crate_hash
.get()
.expect("crate_hash(LOCAL_CRATE) called before metadata encoding")
} else {
Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this happen?

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed in the case where incremental compilation is enabled but metadata is not (

CrateType::Executable
). It's used in naming the session directory name when finalizing the directory (
pub fn finalize_session_directory(sess: &Session, svh: Option<Svh>) {
set here:
crate_hash: if tcx.sess.opts.incremental.is_some() {
). Incremental compilation still uses it to identify the local crate cache even though metadata isn't generated.

I tried forcing metadata generation whenever incremental compilation is enabled. It was both a large performance regression and exposed some edge cases where the library could be emitted but metadata encoding was not able to succeed.

@@ -1122,78 +1122,30 @@ impl<'tcx> pprust_hir::PpAnn for TyCtxt<'tcx> {
}

pub(super) fn crate_hash(tcx: TyCtxt<'_>, _: LocalCrate) -> Svh {
Copy link
Copy Markdown
Contributor

@cjgillot cjgillot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this implementation called? Could we get rid of it?

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the implementation of the crate_hash query which is used for the stub metadata header and for finalizing a session (

crate_hash: if tcx.sess.opts.incremental.is_some() {
).

@petrochenkov
Copy link
Copy Markdown
Contributor

petrochenkov commented May 26, 2026

If this needs an extra pair of eyes, feel free to assign me after cjgillot reviews.
I'm generally interested in this topic and RDR, just don't have time for doing full initial reviews at the moment.

@petrochenkov
Copy link
Copy Markdown
Contributor

Is there any ordering dependency between this PR and #155871?
Should this PR be merged first, or it doesn't matter?

@susitsm
Copy link
Copy Markdown
Contributor

susitsm commented May 26, 2026

These are independent. The one getting in later will have to do a bit of merge conflict resolution, but nothing too bad.

@rust-bors

This comment has been minimized.

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 28, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-meta Area: Issues & PRs about the rust-lang/rust repository itself A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) A-run-make Area: port run-make Makefiles to rmake.rs A-tidy Area: The tidy tool S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc-frontend Relevant to the rustdoc-frontend team, which will review and decide on the web UI/UX output. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver)

Projects

None yet

Development

Successfully merging this pull request may close these issues.