feat: implement Delta Analysis Engine for auditing dependency version bumps (#106) by prasiddhi-105 · Pull Request #114 · ionfwsrijan/PatchPilot

prasiddhi-105 · 2026-06-15T12:55:19Z

Before opening: make sure there is an issue tracking this work, and link it below. PRs without a linked issue may be closed without review.

Linked issue

Closes #106

What this PR does

This PR implements the core Delta Analysis Engine to evaluate third-party package modifications and dependency version bumps. It dynamically fetches source distributions directly from public registries (such as PyPI) and extracts them into temporary file structures. It then analyzes the differences to track completely new files and runs an AST-based static analyzer over the source code to flag high-risk patterns like eval(), preventing hidden security regressions.

Type of change

ML tier (if applicable)

N/A

How did you test this?

I tested this implementation locally using Python's sandbox runtime environment by passing version bump sequences through the parsing and auditing pipeline.

Executed end-to-end simulation tracking requests: 2.28.1 -> 2.31.0.
Verified live network communication fetching source distributions (.tar.gz) from the PyPI JSON API.
Confirmed the system successfully unpacks archives, extracts directory footprints, executes Abstract Syntax Tree (ast) node inspections, and returns the strictly formatted output schema payload without errors.

🛠️ Implementation Details

Version Manifest Diffing: Added a modular parser inside delta_engine.py using regex mapping to extract package metadata (dependency_name, old_version, and new_version) from incoming strings.
Upstream Registry Fetching: Integrated a native, lightweight downloader that queries the public PyPI registry JSON API to isolate and download source distributions (.tar.gz).
Code Changes Audit: Developed a directory-walking comparator to pinpoint added_files, paired with a built-in Abstract Syntax Tree (ast) walker to check python files for high-risk language signatures.

Verified Output Payload Schema:

{
  "supply_chain_diff": {
    "dependency": "requests",
    "upgrade_path": "2.28.1 -> 2.31.0",
    "risk_assessment": {
      "added_files": [],
      "suspicious_patterns_detected": [],
      "overall_risk_score": "low"
    }
  }
}

…nfwsrijan#106

github-actions · 2026-06-15T12:55:35Z

⚠️ Automated Check: This PR does not strictly follow the required template. Please ensure you have not deleted any checkboxes or mandatory headings, and that you have written explanations under What this PR does and How did you test this?.

prasiddhi-105 · 2026-06-15T12:58:07Z

hii would love to work under you and contribute in your project, please assign me.

ionfwsrijan · 2026-06-18T18:22:15Z

@prasiddhi-105 Please fix failing checks

arpit2006

@prasiddhi-105 !
Thanks for the contribution. The overall concept is valuable, and the goal of analyzing dependency upgrades for supply-chain risk aligns well with PatchPilot's mission. However, after reviewing the implementation description, there are several concerns that should be addressed before this can be merged.

🔴 Critical Issues

1. Unbounded Network Access During Analysis

The implementation performs live requests to the PyPI API and downloads source distributions at runtime.

This introduces several concerns:

Analysis results become dependent on external network availability.
Repeated scans of the same dependency may trigger unnecessary downloads.
Registry outages or rate limits can cause scan failures.
Results are no longer deterministic or reproducible.

Please consider:

Adding caching for downloaded artifacts.
Explicit request timeouts.
Graceful handling of registry failures.
Clear documentation regarding network requirements.

2. Missing Download Validation

The PR description mentions downloading .tar.gz archives directly from public registries, but does not mention any validation of:

Download size
Archive size after extraction
Archive member paths
Compression ratio

Without safeguards, malicious or malformed archives could introduce:

Zip/Tar Slip vulnerabilities
Resource exhaustion attacks
Excessive disk usage

Please ensure archive extraction includes path validation and extraction limits.

3. Temporary Extraction Cleanup Not Described

The implementation extracts archives into temporary directories but does not describe cleanup behavior.

Questions that should be addressed:

Are temporary directories always removed?
Does cleanup occur on exceptions?
What happens when analysis fails midway?

Please ensure cleanup is handled via context managers or equivalent mechanisms.

4. Risk Scoring Is Not Clearly Defined

The example output returns:

"overall_risk_score": "low"

However, the scoring methodology is not documented.

For example:

How many suspicious patterns trigger medium/high risk?
Are added files weighted?
Do different AST findings carry different severity levels?

Please document the scoring model so results are predictable and explainable.

🟡 Functional Gaps

5. AST Detection Appears Limited

The description currently mentions detecting:

eval()

This is a good start, but supply-chain attacks commonly involve additional risky behaviors such as:

exec()
compile()
subprocess
os.system
dynamic imports
network calls during installation
credential exfiltration patterns

Consider expanding the detection set or documenting current limitations.

6. Added Files Detection Alone May Miss Risky Changes

The implementation focuses on newly added files.

However, a dependency update may introduce malicious behavior through modifications to existing files without adding any new files.

Examples:

Backdoored functions
Modified install hooks
Credential collection logic

Please clarify whether modified files are intentionally out of scope for v1.

7. Lack of Automated Tests

The PR description references manual testing using:

requests 2.28.1 -> 2.31.0

but does not mention automated test coverage.

This feature should include tests for:

Version parsing
Registry response handling
Archive extraction
AST pattern detection
Risk score generation
Failure scenarios

Manual validation alone is not sufficient for a feature of this complexity.

🟡 Documentation

8. Dependency Requirements Not Specified

The implementation introduces:

HTTP downloads
Archive processing
AST analysis

Please ensure:

New dependencies are documented.
Installation instructions are updated if required.
Any network assumptions are documented.

Summary

The feature has a strong foundation and addresses a meaningful supply-chain security use case. However, before merge I'd like to see:

Download and extraction safeguards.
Explicit cleanup handling.
Documented risk-scoring logic.
Improved suspicious-pattern coverage.
Automated test coverage.
Clarification of scope regarding modified files.

Once these concerns are addressed, I'd be happy to review the updated implementation.

arpit2006 · 2026-06-26T11:41:07Z

@prasiddhi-105 , Any updates?

prasiddhi-105 · 2026-06-26T11:56:51Z

Yes, @arpit2006, I am actively working on it. You will receive an update today, or by tomorrow morning at the latest.

feat: implement core delta analysis engine for dependency auditing io…

873ed33

…nfwsrijan#106

github-actions Bot added feature New feature SSoC26 needs-work Work needed labels Jun 15, 2026

Implement SARIF telemetry export

0feca34

arpit2006 self-requested a review June 24, 2026 13:41

arpit2006 assigned prasiddhi-105 Jun 24, 2026

arpit2006 added Medium Medium difficulty under-review checks-failing labels Jun 24, 2026

arpit2006 requested changes Jun 24, 2026

View reviewed changes

arpit2006 added changes-required inactive needs-contributor CI-Fail labels Jun 24, 2026

arpit2006 added assigned Template-passed labels Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement Delta Analysis Engine for auditing dependency version bumps (#106)#114

feat: implement Delta Analysis Engine for auditing dependency version bumps (#106)#114
prasiddhi-105 wants to merge 2 commits into
ionfwsrijan:mainfrom
prasiddhi-105:feature/delta-analysis-engine

prasiddhi-105 commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

prasiddhi-105 commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 18, 2026

Uh oh!

arpit2006 left a comment

Uh oh!

arpit2006 commented Jun 26, 2026

Uh oh!

prasiddhi-105 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

prasiddhi-105 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked issue

What this PR does

Type of change

ML tier (if applicable)

How did you test this?

🛠️ Implementation Details

Verified Output Payload Schema:

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

prasiddhi-105 commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 18, 2026

Uh oh!

arpit2006 left a comment

Choose a reason for hiding this comment

🔴 Critical Issues

1. Unbounded Network Access During Analysis

2. Missing Download Validation

3. Temporary Extraction Cleanup Not Described

4. Risk Scoring Is Not Clearly Defined

🟡 Functional Gaps

5. AST Detection Appears Limited

6. Added Files Detection Alone May Miss Risky Changes

7. Lack of Automated Tests

🟡 Documentation

8. Dependency Requirements Not Specified

Summary

Uh oh!

arpit2006 commented Jun 26, 2026

Uh oh!

prasiddhi-105 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prasiddhi-105 commented Jun 15, 2026 •

edited

Loading