Skip to content

feat(indexing): respect .gitignore when indexing#29

Open
torresmateo wants to merge 1 commit into
mainfrom
28-respect-gitignore-when-indexing
Open

feat(indexing): respect .gitignore when indexing#29
torresmateo wants to merge 1 commit into
mainfrom
28-respect-gitignore-when-indexing

Conversation

@torresmateo
Copy link
Copy Markdown
Collaborator

@torresmateo torresmateo commented May 22, 2026

Closes #28.

Summary

  • Aggregate patterns from every .gitignore under the indexed root and skip matching files during indexing.
  • Hardcoded baseline (.git, node_modules, __pycache__, caches, binaries, fonts, media) still applies — .gitignore layers on top.
  • New --include-ignored flag on libr add (and include_ignored arg on index_directory_to_library) opts out. The flag is persisted on the source entry and honored by libr index build on rebuild.
  • The previously-duplicated _should_skip_file in cli.py and server.py now delegates to a shared librarian/sources/ignore.py. This file is the home for the new GitignoreMatcher (which uses pathspec's GitIgnoreSpec under the hood and rewrites nested patterns to be anchored under their containing directory).

Test plan

  • make check clean (lint, format, mypy)
  • make test-fast — 96 passed / 8 skipped
  • 13 new unit tests in tests/test_gitignore.py covering: no-gitignore, root patterns, floating patterns at any depth, anchored patterns, nested-gitignore scoping, negation, out-of-root paths, the always-skip baseline, and the include-ignored path
  • CLI smoke test: created a tree with build/ in .gitignore; libr add --dry-run shows 1 file, libr add --dry-run --include-ignored shows 2 files
  • E2E smoke test on synthetic node_modules/ tree: SKIP node_modules/pkg/skip.md, KEEP src/readme.md

Aggregates patterns from every .gitignore under the indexed root and
skips matching files. Hardcoded baseline (.git, node_modules, caches,
binaries) still applies. New --include-ignored flag on libr add (and
include_ignored arg on index_directory_to_library) opts out.

The previously-duplicated _should_skip_file in cli.py and server.py
now delegates to a shared helper in librarian/sources/ignore.py.
".git",
".svn",
".hg",
"node_modules",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes look into node_modules to figure out a library's types or to debug a bug and see whether it's in my code or a dependency. Not sure if we can add an explicit flag to include a file or directory, even if it's on the list or in .gitignore, instead of --include-ignored.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, I think having --include-ignore makes sense as a convenience shorthand. I'm thinking of these options for manually allowlisting specific files/directories:

  • Have .librariantrack which overrides .gitignore on the specific directory
  • Have --force-include <list of files/directories> to include.

In both cases pointing to a directory will recursively include files inside as if --include-igore was used. And for this we'll also remove the hardcoded default in these lines.

How does this sound?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Respect .gitignore when indexing (skip node_modules etc.)

2 participants