Skip to content

Use git ls-files -s instead of ls-tree for full-tree enumeration#2013

Open
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/ls-files-optimization
Open

Use git ls-files -s instead of ls-tree for full-tree enumeration#2013
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/ls-files-optimization

Conversation

@tyrielv

@tyrielv tyrielv commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

When no previous commit exists to diff against (sourceTreeSha == null), DiffHelper.PerformDiff runs git ls-tree -r -t HEAD to enumerate all blobs and trees. This walks every tree object — very slow on large repos.

Replace with git ls-files -s, which reads the git index instead of walking tree objects. The index is already materialized in GVFS-mounted repos, making this significantly faster.

The optimization only applies when the target tree matches HEAD (i.e., the index reflects the tree we need). This is always the case for gvfs prefetch, which resolves HEAD as its target (PrefetchVerb.LoadBlobPrefetchArgsRevParse(HEAD)). For other callers like FastFetch force-checkout (which can target a non-HEAD commit), the code falls back to ls-tree to preserve correctness.

Benchmark (repo with ~2.5M files)

Approach Time Speedup
git ls-tree -r -t HEAD (before) ~24s baseline
git ls-files -s (after) ~6.5s 3.7×
libgit2 in-process tree walk ~8.2s 2.9×
libgit2 in-process index read ~12.9s 1.9×

Also benchmarked libgit2 alternatives: in-process recursive tree walk (2.9× faster than ls-tree) and in-process index read (1.9× — marshaling overhead). git ls-files -s was the fastest and simplest option.

Changes

  • GitProcess.cs — New LsFilesStaging() method that runs git ls-files -s
  • DiffTreeResult.cs — New ParseFromLsFilesStagingLine() parser for the <mode> <sha> <stage>\t<path> format
  • DiffHelper.csPerformDiff now uses ls-files -s when sourceTreeSha == null and targetTreeSha matches HEAD's tree (verified via libgit2). Falls back to ls-tree otherwise.
  • DiffTreeResultTests.cs — 7 new unit tests for the parser

Safety

  • TargetMatchesHeadTree() resolves HEAD's tree SHA via libgit2 and compares to the requested targetTreeSha. Only uses the index-based path when they match.
  • Falls back to ls-tree if the index is unavailable, HEAD can't be resolved, or the target differs from HEAD.
  • ls-files -s only returns file entries (not tree entries). Tree entries from ls-tree were only used for directory creation, which FlushStagedQueues handles from file paths anyway.

@tyrielv tyrielv force-pushed the tyrielv/ls-files-optimization branch from 6976987 to b10a10b Compare June 9, 2026 20:33
When no previous commit exists to diff against (sourceTreeSha == null),
DiffHelper.PerformDiff previously ran 'git ls-tree -r -t HEAD' which walks
all tree objects. On a large repo with ~2.5M files, this takes ~24s.

Replace with 'git ls-files -s' which reads the index instead of walking
tree objects. Benchmarked at ~6.5s on the same repo — a 3.7x speedup.

The optimization is only applied when targetTreeSha matches HEAD's tree,
since ls-files reads the index (which reflects HEAD). When they differ
(e.g., FastFetch checking out a non-HEAD commit), falls back to ls-tree
to preserve correctness.

Also falls back to ls-tree if ls-files fails (e.g., index does not exist
on fresh git init before first checkout).

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
@tyrielv tyrielv force-pushed the tyrielv/ls-files-optimization branch from b10a10b to d4988aa Compare June 9, 2026 20:37
@tyrielv tyrielv marked this pull request as ready for review June 10, 2026 18:33
using (LibGit2Repo repo = new LibGit2Repo(this.tracer, this.enlistment.WorkingDirectoryBackingRoot))
{
string headTreeSha = repo.GetTreeSha("HEAD");
if (headTreeSha != null && string.Equals(headTreeSha, targetTreeSha, StringComparison.OrdinalIgnoreCase))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard that decides whether to use the fast ls-files path compares HEAD's tree SHA against a value the real callers pass as a commit SHA, so the comparison may never be true and the optimization may never actually run.

}

// ls-files -s only returns file entries, never trees
this.EnqueueFileAddOperation(activity, result);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ls-files path only enqueues file adds and never directory entries. FastFetch's checkout relies on directory operations to create folders before writing files, and the file writer does not create missing parent directories.

DiffTreeResult blobAdd = new DiffTreeResult();
blobAdd.TargetMode = Convert.ToUInt16(line.Substring(0, 6), 8);
blobAdd.TargetIsSymLink = blobAdd.TargetMode == SymLinkFileIndexEntry;
blobAdd.TargetSha = line.Substring(7, GVFSConstants.ShaStringLength);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ls-files -s reads the index, which can differ from HEAD's tree (staged or unmerged entries). The parser ignores the stage column, so unmerged paths could produce duplicate adds with the wrong blob SHA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants