Reduce scancode ignore patterns with coarse exclude globs by soimkim · Pull Request #276 · fosslight/fosslight_source_scanner

soimkim · 2026-06-05T04:40:48Z

Description

Refactor
- Enhanced scanning process with improved pattern handling and binary file path management during directory scans
- Optimized configuration application timing within the scanning workflow

coderabbitai · 2026-06-05T04:40:58Z

Warning

Review limit reached

@soimkim, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 24 minutes and 48 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4cae143d-8125-43a4-9d8f-1a5807921f44

📥 Commits

Reviewing files that changed from the base of the PR and between c822f92 and 30ad6e9.

📒 Files selected for processing (1)

src/fosslight_source/run_scancode.py

📝 Walkthrough

Walkthrough

run_scancode.py now generates scancode ignore patterns by assembling helper functions that build coarse glob patterns from exclusion rules and augment them with normalized excluded paths plus discovered binary file paths. During directory scans, binary files are collected and passed to ignore-pattern construction, with the UNSET workaround applied immediately before scan execution.

Changes

Scancode Ignore Pattern Generation

Layer / File(s)	Summary
Ignore pattern helpers and imports `src/fosslight_source/run_scancode.py`	Module imports updated to pull in exclusion constants and `is_included` utility; typing expanded to include `Iterable`. Four new helper functions added: `_default_scancode_coarse_ignore_patterns()` builds coarse globs from exclusion defaults, `_is_covered_by_coarse_ignore()` detects whether a path matches existing patterns, `_add_path_to_exclude_pattern()` expands excluded paths into appropriate glob rules, and `_build_scancode_ignore_patterns()` assembles and sorts the final ignore tuple from user exclusions and binary paths.
run_scan integration and execution flow `src/fosslight_source/run_scancode.py`	During directory traversal, `abs_path_to_scan` is initialized and `binary_paths` accumulator collects discovered binary file relative paths. The ignore-pattern tuple is built using the new helper via `_build_scancode_ignore_patterns()` and pattern count is logged. The `_apply_scancode_unset_workaround()` call is repositioned to occur immediately before `cli.run_scan()` invocation so corrected defaults are applied at scan time.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fosslight/fosslight_source_scanner#265: Both PRs modify src/fosslight_source/run_scancode.py to detect binaries during run_scan and incorporate those discovered binary paths into ScanCode ignore/exclude pattern generation and logic.

Suggested labels

enhancement

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: refactoring scancode ignore patterns to use coarse exclude globs for performance improvement.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch scancode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fosslight_source/run_scancode.py`:
- Line 225: The log that reports the number of Scancode ignore patterns is
routine information and should not use WARNING; change the call to use
logger.info (or logger.debug if more appropriate) instead of logger.warning for
the message that references ignore_tuple (e.g., replace
logger.warning(f"Scancode ignore patterns: {len(ignore_tuple)}") with
logger.info(...)) in the run_scancode logic so operators won’t be alerted for
normal state.
- Around line 142-143: The loop that builds glob patterns for binary_paths (in
run_scancode.py where variables binary_paths, rel_path, and patterns are used)
wrongly prefixes each relative path with "**/" which causes suffix matching
across the tree; instead add the rel_path itself (normalized to posix form) to
patterns without the "**/" prefix (or prefix with "./" if you need explicit
scan-root anchoring) so the exclusion matches the exact relative path from the
scan root.
- Around line 120-122: The current code adds a glob
"**/{exclude_path_normalized}" for files which can over-match suffixes; in the
block that handles os.path.isfile(full_exclude_path) (referencing
full_exclude_path, exclude_path_normalized, patterns and
_is_covered_by_coarse_ignore), replace the "**/"-prefixed pattern with the exact
relative file path (i.e. add exclude_path_normalized directly) so only that
specific file is excluded and keep the existing coarse-ignore check intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 12607825-dfb2-4f3a-8d36-71eada5c8d03

📥 Commits

Reviewing files that changed from the base of the PR and between 9e79c18 and c822f92.

📒 Files selected for processing (1)

src/fosslight_source/run_scancode.py

soimkim changed the title ~~perf: reduce scancode ignore patterns with coarse exclude globs~~ Reduce scancode ignore patterns with coarse exclude globs Jun 5, 2026

soimkim self-assigned this Jun 5, 2026

soimkim added the chore [PR/Issue] Refactoring, maintenance the code label Jun 5, 2026

perf: reduce scancode ignore patterns with coarse exclude globs

30ad6e9

soimkim force-pushed the scancode branch from c822f92 to 30ad6e9 Compare June 5, 2026 04:42

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread src/fosslight_source/run_scancode.py

Comment thread src/fosslight_source/run_scancode.py

Comment thread src/fosslight_source/run_scancode.py Outdated

soimkim marked this pull request as draft June 5, 2026 04:57

soimkim marked this pull request as ready for review June 5, 2026 05:16

soimkim merged commit bb61f18 into main Jun 5, 2026
8 checks passed

soimkim deleted the scancode branch June 5, 2026 05:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce scancode ignore patterns with coarse exclude globs#276

Reduce scancode ignore patterns with coarse exclude globs#276
soimkim merged 1 commit into
mainfrom
scancode

soimkim commented Jun 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

soimkim commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

soimkim commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading