Skip to content

Reduce scancode ignore patterns with coarse exclude globs#276

Merged
soimkim merged 1 commit into
mainfrom
scancode
Jun 5, 2026
Merged

Reduce scancode ignore patterns with coarse exclude globs#276
soimkim merged 1 commit into
mainfrom
scancode

Conversation

@soimkim

@soimkim soimkim commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

  • Refactor
    • Enhanced scanning process with improved pattern handling and binary file path management during directory scans
    • Optimized configuration application timing within the scanning workflow

@soimkim soimkim changed the title perf: reduce scancode ignore patterns with coarse exclude globs Reduce scancode ignore patterns with coarse exclude globs Jun 5, 2026
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@soimkim, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 24 minutes and 48 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4cae143d-8125-43a4-9d8f-1a5807921f44

📥 Commits

Reviewing files that changed from the base of the PR and between c822f92 and 30ad6e9.

📒 Files selected for processing (1)
  • src/fosslight_source/run_scancode.py
📝 Walkthrough

Walkthrough

run_scancode.py now generates scancode ignore patterns by assembling helper functions that build coarse glob patterns from exclusion rules and augment them with normalized excluded paths plus discovered binary file paths. During directory scans, binary files are collected and passed to ignore-pattern construction, with the UNSET workaround applied immediately before scan execution.

Changes

Scancode Ignore Pattern Generation

Layer / File(s) Summary
Ignore pattern helpers and imports
src/fosslight_source/run_scancode.py
Module imports updated to pull in exclusion constants and is_included utility; typing expanded to include Iterable. Four new helper functions added: _default_scancode_coarse_ignore_patterns() builds coarse globs from exclusion defaults, _is_covered_by_coarse_ignore() detects whether a path matches existing patterns, _add_path_to_exclude_pattern() expands excluded paths into appropriate glob rules, and _build_scancode_ignore_patterns() assembles and sorts the final ignore tuple from user exclusions and binary paths.
run_scan integration and execution flow
src/fosslight_source/run_scancode.py
During directory traversal, abs_path_to_scan is initialized and binary_paths accumulator collects discovered binary file relative paths. The ignore-pattern tuple is built using the new helper via _build_scancode_ignore_patterns() and pattern count is logged. The _apply_scancode_unset_workaround() call is repositioned to occur immediately before cli.run_scan() invocation so corrected defaults are applied at scan time.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • fosslight/fosslight_source_scanner#265: Both PRs modify src/fosslight_source/run_scancode.py to detect binaries during run_scan and incorporate those discovered binary paths into ScanCode ignore/exclude pattern generation and logic.

Suggested labels

enhancement

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: refactoring scancode ignore patterns to use coarse exclude globs for performance improvement.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch scancode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@soimkim soimkim self-assigned this Jun 5, 2026
@soimkim soimkim added the chore [PR/Issue] Refactoring, maintenance the code label Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fosslight_source/run_scancode.py`:
- Line 225: The log that reports the number of Scancode ignore patterns is
routine information and should not use WARNING; change the call to use
logger.info (or logger.debug if more appropriate) instead of logger.warning for
the message that references ignore_tuple (e.g., replace
logger.warning(f"Scancode ignore patterns: {len(ignore_tuple)}") with
logger.info(...)) in the run_scancode logic so operators won’t be alerted for
normal state.
- Around line 142-143: The loop that builds glob patterns for binary_paths (in
run_scancode.py where variables binary_paths, rel_path, and patterns are used)
wrongly prefixes each relative path with "**/" which causes suffix matching
across the tree; instead add the rel_path itself (normalized to posix form) to
patterns without the "**/" prefix (or prefix with "./" if you need explicit
scan-root anchoring) so the exclusion matches the exact relative path from the
scan root.
- Around line 120-122: The current code adds a glob
"**/{exclude_path_normalized}" for files which can over-match suffixes; in the
block that handles os.path.isfile(full_exclude_path) (referencing
full_exclude_path, exclude_path_normalized, patterns and
_is_covered_by_coarse_ignore), replace the "**/"-prefixed pattern with the exact
relative file path (i.e. add exclude_path_normalized directly) so only that
specific file is excluded and keep the existing coarse-ignore check intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 12607825-dfb2-4f3a-8d36-71eada5c8d03

📥 Commits

Reviewing files that changed from the base of the PR and between 9e79c18 and c822f92.

📒 Files selected for processing (1)
  • src/fosslight_source/run_scancode.py

Comment thread src/fosslight_source/run_scancode.py
Comment thread src/fosslight_source/run_scancode.py
Comment thread src/fosslight_source/run_scancode.py Outdated
@soimkim soimkim marked this pull request as draft June 5, 2026 04:57
@soimkim soimkim marked this pull request as ready for review June 5, 2026 05:16
@soimkim soimkim merged commit bb61f18 into main Jun 5, 2026
8 checks passed
@soimkim soimkim deleted the scancode branch June 5, 2026 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore [PR/Issue] Refactoring, maintenance the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant