Skip to content

Don't canonicalise file keys#1250

Open
lionel- wants to merge 1 commit into
oak/salsa-13-scan-asyncfrom
oak/salsa-14-canonicalisation
Open

Don't canonicalise file keys#1250
lionel- wants to merge 1 commit into
oak/salsa-13-scan-asyncfrom
oak/salsa-14-canonicalisation

Conversation

@lionel-
Copy link
Copy Markdown
Contributor

@lionel- lionel- commented Jun 3, 2026

Branched from #1245
Progress towards #1212

While I was working on breakpoints I ran into difficulties matching files from different locations: frontend URIs via DAP requests and source references from R whose paths went through normalizePath(). The latter does file canonicalisation, which means that in addition to resolving . and .., it also follows symlinks. The frontend URIs are not canonicalised, which caused CI failures when running tests in projects storedc in temporary folders, with mismatches between /tmp and /private/tmp. The fix I implemented was to canonicalise everywhere and encode that canonicalisation happened with the UrlId file, which we stored as file keys in our maps tracking state (e.g. breakpoints) by file.

The UrlId normalisation also dealt with other issues that surfaced over time, such as windows letter drive casing because to fix issues with lowercase on one end and uppercase on the other.

The same sort of issue came up with the LSP: we get URIs from the frontend, but we also encounter source() calls that may be relative and we need to resolve them. So I reached for the same UrlId approach to make the different file paths agree with each other.

It turns out that doing any sort of normalisation that requires I/O is a red flag. If a file gets deleted on disk, and we get an event about that file, there is literally no way to match it back to what we know. For the LSP that's a real issue because we get file deletion events after the file is deleted.

The other thing to watch out for: Always report back the unormalised URI/path to the frontend, because that's what they use to map files on their end. Any changes might cause the frontend to treat the file as different.

With this PR:

  • We no longer canonicalise files that we store as keys. We only do lexical normalisation: resolve . and ... That matches r-a and ty.
  • We still do canonicalisation at comparison time, in a secondary lookup. This way we can still match file paths coming from different sources.
  • We keep track of original file URI/paths for communication with the frontend. For now only in the DAP, this will be done for the LSP in another PR.

Only canonicalise when comparing against paths from external sources
(e.g. normalised by R). Canonicalisation may corrupt the identity of
paths sent by frontends, and may prevent matching files at all when they
are deleted (at which point a non-canonicalised path sent by the
frontend can't be matched against canonicalised paths we've stored as
keys)
@lionel- lionel- force-pushed the oak/salsa-14-canonicalisation branch from 0e6a188 to eb2c844 Compare June 4, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant