Skip to content

feat: implement Delta Analysis Engine for auditing dependency version bumps (#106)#114

Open
prasiddhi-105 wants to merge 2 commits into
ionfwsrijan:mainfrom
prasiddhi-105:feature/delta-analysis-engine
Open

feat: implement Delta Analysis Engine for auditing dependency version bumps (#106)#114
prasiddhi-105 wants to merge 2 commits into
ionfwsrijan:mainfrom
prasiddhi-105:feature/delta-analysis-engine

Conversation

@prasiddhi-105

@prasiddhi-105 prasiddhi-105 commented Jun 15, 2026

Copy link
Copy Markdown

Before opening: make sure there is an issue tracking this work, and link it below. PRs without a linked issue may be closed without review.

Linked issue

Closes #106

What this PR does

This PR implements the core Delta Analysis Engine to evaluate third-party package modifications and dependency version bumps. It dynamically fetches source distributions directly from public registries (such as PyPI) and extracts them into temporary file structures. It then analyzes the differences to track completely new files and runs an AST-based static analyzer over the source code to flag high-risk patterns like eval(), preventing hidden security regressions.

Type of change

  • Bug fix
  • New feature
  • ML model / training pipeline
  • Refactor (no behaviour change)
  • Documentation
  • Tests only

ML tier (if applicable)

N/A

How did you test this?

I tested this implementation locally using Python's sandbox runtime environment by passing version bump sequences through the parsing and auditing pipeline.

  • Executed end-to-end simulation tracking requests: 2.28.1 -> 2.31.0.
  • Verified live network communication fetching source distributions (.tar.gz) from the PyPI JSON API.
  • Confirmed the system successfully unpacks archives, extracts directory footprints, executes Abstract Syntax Tree (ast) node inspections, and returns the strictly formatted output schema payload without errors.

🛠️ Implementation Details

  • Version Manifest Diffing: Added a modular parser inside delta_engine.py using regex mapping to extract package metadata (dependency_name, old_version, and new_version) from incoming strings.
  • Upstream Registry Fetching: Integrated a native, lightweight downloader that queries the public PyPI registry JSON API to isolate and download source distributions (.tar.gz).
  • Code Changes Audit: Developed a directory-walking comparator to pinpoint added_files, paired with a built-in Abstract Syntax Tree (ast) walker to check python files for high-risk language signatures.

Verified Output Payload Schema:

{
  "supply_chain_diff": {
    "dependency": "requests",
    "upgrade_path": "2.28.1 -> 2.31.0",
    "risk_assessment": {
      "added_files": [],
      "suspicious_patterns_detected": [],
      "overall_risk_score": "low"
    }
  }
}

@github-actions github-actions Bot added feature New feature SSoC26 needs-work Work needed labels Jun 15, 2026
@github-actions

Copy link
Copy Markdown

⚠️ Automated Check: This PR does not strictly follow the required template. Please ensure you have not deleted any checkboxes or mandatory headings, and that you have written explanations under What this PR does and How did you test this?.

@prasiddhi-105

Copy link
Copy Markdown
Author

hii would love to work under you and contribute in your project, please assign me.

@ionfwsrijan

Copy link
Copy Markdown
Owner

@prasiddhi-105 Please fix failing checks

@arpit2006 arpit2006 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prasiddhi-105 !
Thanks for the contribution. The overall concept is valuable, and the goal of analyzing dependency upgrades for supply-chain risk aligns well with PatchPilot's mission. However, after reviewing the implementation description, there are several concerns that should be addressed before this can be merged.

🔴 Critical Issues

1. Unbounded Network Access During Analysis

The implementation performs live requests to the PyPI API and downloads source distributions at runtime.

This introduces several concerns:

  • Analysis results become dependent on external network availability.
  • Repeated scans of the same dependency may trigger unnecessary downloads.
  • Registry outages or rate limits can cause scan failures.
  • Results are no longer deterministic or reproducible.

Please consider:

  • Adding caching for downloaded artifacts.
  • Explicit request timeouts.
  • Graceful handling of registry failures.
  • Clear documentation regarding network requirements.

2. Missing Download Validation

The PR description mentions downloading .tar.gz archives directly from public registries, but does not mention any validation of:

  • Download size
  • Archive size after extraction
  • Archive member paths
  • Compression ratio

Without safeguards, malicious or malformed archives could introduce:

  • Zip/Tar Slip vulnerabilities
  • Resource exhaustion attacks
  • Excessive disk usage

Please ensure archive extraction includes path validation and extraction limits.


3. Temporary Extraction Cleanup Not Described

The implementation extracts archives into temporary directories but does not describe cleanup behavior.

Questions that should be addressed:

  • Are temporary directories always removed?
  • Does cleanup occur on exceptions?
  • What happens when analysis fails midway?

Please ensure cleanup is handled via context managers or equivalent mechanisms.


4. Risk Scoring Is Not Clearly Defined

The example output returns:

"overall_risk_score": "low"

However, the scoring methodology is not documented.

For example:

  • How many suspicious patterns trigger medium/high risk?
  • Are added files weighted?
  • Do different AST findings carry different severity levels?

Please document the scoring model so results are predictable and explainable.


🟡 Functional Gaps

5. AST Detection Appears Limited

The description currently mentions detecting:

eval()

This is a good start, but supply-chain attacks commonly involve additional risky behaviors such as:

  • exec()
  • compile()
  • subprocess
  • os.system
  • dynamic imports
  • network calls during installation
  • credential exfiltration patterns

Consider expanding the detection set or documenting current limitations.


6. Added Files Detection Alone May Miss Risky Changes

The implementation focuses on newly added files.

However, a dependency update may introduce malicious behavior through modifications to existing files without adding any new files.

Examples:

  • Backdoored functions
  • Modified install hooks
  • Credential collection logic

Please clarify whether modified files are intentionally out of scope for v1.


7. Lack of Automated Tests

The PR description references manual testing using:

requests 2.28.1 -> 2.31.0

but does not mention automated test coverage.

This feature should include tests for:

  • Version parsing
  • Registry response handling
  • Archive extraction
  • AST pattern detection
  • Risk score generation
  • Failure scenarios

Manual validation alone is not sufficient for a feature of this complexity.


🟡 Documentation

8. Dependency Requirements Not Specified

The implementation introduces:

  • HTTP downloads
  • Archive processing
  • AST analysis

Please ensure:

  • New dependencies are documented.
  • Installation instructions are updated if required.
  • Any network assumptions are documented.

Summary

The feature has a strong foundation and addresses a meaningful supply-chain security use case. However, before merge I'd like to see:

  • Download and extraction safeguards.
  • Explicit cleanup handling.
  • Documented risk-scoring logic.
  • Improved suspicious-pattern coverage.
  • Automated test coverage.
  • Clarification of scope regarding modified files.

Once these concerns are addressed, I'd be happy to review the updated implementation.

@arpit2006

Copy link
Copy Markdown
Collaborator

@prasiddhi-105 , Any updates?

@prasiddhi-105

Copy link
Copy Markdown
Author

Yes, @arpit2006, I am actively working on it. You will receive an update today, or by tomorrow morning at the latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delta analysis engine for auditing dependency version bumps

3 participants