Microck · Microck · Apr 27, 2026 · Apr 3, 2026 · Apr 27, 2026
diff --git a/README.md b/README.md
@@ -316,7 +316,7 @@ bun scripts/render-benchmark-figures.ts \
 
 ## Quickstart
 
-**Prerequisites:** Rust stable toolchain ([rustup.rs](https://rustup.rs))
+**Prerequisites:** Rust 1.85+ toolchain (2024 edition) ([rustup.rs](https://rustup.rs))
 
 ```bash
 git clone https://github.com/Microck/jarspect.git
@@ -403,10 +403,12 @@ curl http://localhost:18000/scans/<scan_id>
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `JARSPECT_BIND` | `127.0.0.1:18000` | Host and port the HTTP server binds to |
+| `JARSPECT_WEB_DIR` | `web` | Web assets directory served at `/static` and used for `/` |
 | `JARSPECT_RULEPACKS` | `demo` | Which YARA/signature rulepacks to load: `demo`, `prod`, or `demo,prod` |
 | `JARSPECT_AI_ENABLED` | `1` | Enable/disable AI verdict even if Azure OpenAI env vars are set (`0`/`false` to disable) |
 | `JARSPECT_UPLOAD_MAX_BYTES` | `52428800` | Maximum accepted upload size in bytes (default 50 MiB) |
 | `JARSPECT_MB_HASH_MATCH_ENABLED` | `1` | Enable/disable MalwareBazaar hash matching (`0`/`false` to disable; useful for benchmarking static/AI detectors) |
+| `JARSPECT_MB_MATCH_CONTINUE_ANALYSIS` | `0` | Continue static analysis after a MalwareBazaar hash match (`1`/`true` to keep profile artifacts while verdict stays `malwarebazaar_hash`) |
 | `RUST_LOG` | `jarspect=info,tower_http=info` | Log verbosity (uses `tracing-subscriber` env-filter syntax) |
 
 ### AI Verdict (required for production)
@@ -501,10 +503,11 @@ Liveness check. Reports AI status, loaded rulepacks, signature/YARA rule counts,
   "service": "jarspect",
   "version": "0.1.0",
   "ai_enabled": true,
-  "rulepacks": "prod",
-  "signature_count": 12,
-  "yara_rule_count": 6,
-  "mb_hash_match_enabled": true,
+  "rulepacks": ["prod"],
+  "signatures_loaded": 12,
+  "yara_rulepacks_loaded": 1,
+  "malwarebazaar_hash_match_enabled": true,
+  "malwarebazaar_match_continue_analysis": false,
   "upload_max_bytes": 52428800
 }
 ```
@@ -642,7 +645,7 @@ Verdict object:
 ## Safety and Limitations
 
 - **No sandbox.** Jarspect does not execute or load any `.class` files. All analysis is purely static (bytecode-level constant-pool and instruction parsing).
-- **AI-dependent.** Production verdicts require a working Azure OpenAI endpoint. Without AI configuration, scans will fail with an error. The AI model's judgment is the final authority on ambiguous cases.
+- **AI-preferred with fallback.** Production verdicts use Azure OpenAI when configured. Without AI configuration, scans still succeed via `heuristic_fallback`; the AI path is preferred for ambiguous cases.
 - **Rate limiting.** Azure OpenAI endpoints may be rate-limited (429 responses). Jarspect retries with exponential backoff but will fail if rate-limited for too long.
 - **Synthetic demo fixtures.** The bundled demo rulepack matches strings from `demo/suspicious_sample.jar` -- a synthetic artifact built by `demo/build_sample.sh`. No real malware samples are included in the repository.
 - **Static analysis only.** The bytecode layer extracts capabilities and artifacts deterministically from bytecode evidence, but does not execute code.

diff --git a/docs/corpus-calibration.md b/docs/corpus-calibration.md
@@ -46,7 +46,7 @@ Date: 2026-03-05 (updated from 2026-03-03 initial calibration)
 
 ### Multi-tag expansion calibration (2026-03-05)
 
-6. **Static override layer**: high-confidence static signals (production YARA high/critical, `DETC-03.DYNAMIC_LOAD` high/critical, and others) override the AI verdict to MALICIOUS via `static_override(ai_verdict)`.
+6. **Static override layer**: high-confidence static signals (production YARA high/critical, `DETC-03.BASE64_STAGER` high/critical, `DETC-02.DISCORD_WEBHOOK` high/critical, and others) override the AI verdict to MALICIOUS via `static_override(ai_verdict)`.
 7. **6 production YARA rules** (`data/signatures/prod/rules.yar`): family-specific, multi-string rules for Krypton, MaxCoffe, MaksLibraries, PussyRAT, Loader/Stager, and ETH RPC loader families.
 8. **Exec detector filter** (`is_command_like_string()`): stops error message strings and class names from being misclassified as shell commands (fixed FancyMenu, tr7zw false positives).
 9. **AI prompt tuning**: deserialization is vulnerability-risk not malware; private URLs are low-signal; don't infer shell usage from class names.
@@ -78,7 +78,7 @@ Scans run with `JARSPECT_RULEPACKS=prod JARSPECT_MB_HASH_MATCH_ENABLED=0` (hash
 
 ### Verdict method breakdown (malicious corpus)
 
-Most samples hit via `static_override(ai_verdict)` (production YARA or `DETC-03.DYNAMIC_LOAD` at high severity). Remaining samples caught by the AI itself assigning MALICIOUS.
+Most samples hit via `static_override(ai_verdict)` (production YARA or malware-specific detector signals such as `DETC-03.BASE64_STAGER` / `DETC-02.DISCORD_WEBHOOK` at high severity). Remaining samples caught by the AI itself assigning MALICIOUS.
 
 ### Run artifacts
 

diff --git a/nightshift-doc-drift.md b/nightshift-doc-drift.md
@@ -1,36 +1,189 @@
 # Nightshift: Doc Drift Analysis — Jarspect
 
-**Date:** 2026-04-05
-**Task:** doc-drift
-**Verdict:** Docs are in excellent shape. No actionable drift found.
+**Task:** doc-drift (Documentation Drift Detector)
+**Category:** analysis
+**Date:** 2026-04-03
+**Agent:** Nightshift v3 (GLM 5.1)
+
+---
+
+## Executive Summary
+
+15 documentation drift findings identified across severity levels P0–P3 in the jarspect codebase. Two P0 findings describe actively misleading documentation: removed detector triggers still referenced, and a false AI-dependency claim.
+
+| Severity | Count |
+|----------|-------|
+| P0 (Critical — Misleading) | 2 |
+| P1 (High — Incorrect) | 3 |
+| P2 (Medium — Outdated) | 4 |
+| P3 (Low — Minor) | 6 |
+
+---
 
 ## Findings
 
-### P3 — Minor: "8 capability detectors" claim is underselling
+### Finding 1: P0 — `DETC-03.DYNAMIC_LOAD` Static Override Removal Not Reflected in Docs
+
+- **File:** `docs/corpus-calibration.md`, lines 49, 81
+- **What docs say:** Static override layer includes `DETC-03.DYNAMIC_LOAD` at high/critical severity.
+- **What code does:** `src/scan.rs:360-394` (`high_confidence_static_reason()`) only checks `DETC-03.BASE64_STAGER` and `DETC-02.DISCORD_WEBHOOK`. Test at line 502 explicitly confirms `DETC-03.DYNAMIC_LOAD` was removed.
+- **Recommended fix:** Replace `DETC-03.DYNAMIC_LOAD` with `DETC-03.BASE64_STAGER` and `DETC-02.DISCORD_WEBHOOK` in docs/corpus-calibration.md lines 49 and 81.
+
+---
+
+### Finding 2: P0 — AI-Dependency Claim is False
+
+- **File:** `README.md`, line 608
+- **What docs say:** "Production verdicts require a working Azure OpenAI endpoint. Without AI configuration, scans will fail with an error."
+- **What code does:** `src/scan.rs:275-300` — when `ai_config` is `None`, the code calls `verdict::heuristic_verdict()` with method `"heuristic_fallback"` and returns a successful response.
+- **Recommended fix:** Change to: "Production verdicts prefer a working Azure OpenAI endpoint. Without AI configuration, scans fall back to a heuristic verdict with method `heuristic_fallback`."
+
+---
+
+### Finding 3: P1 — `/health` Response Example Doesn't Match API
+
+- **File:** `README.md`, lines 461-472
+- **What docs say:**
+  ```json
+  { "rulepacks": "prod", "signature_count": 12, "yara_rule_count": 6, "mb_hash_match_enabled": true }
+  ```
+- **What code does** (`src/main.rs:177-188`):
+  ```json
+  { "rulepacks": ["demo"], "signatures_loaded": 6, "yara_rulepacks_loaded": 1, "malwarebazaar_hash_match_enabled": true, "malwarebazaar_match_continue_analysis": false }
+  ```
+- Key differences: `signature_count` → `signatures_loaded`, `yara_rule_count` → `yara_rulepacks_loaded`, `rulepacks` is array not string, `mb_hash_match_enabled` → `malwarebazaar_hash_match_enabled`, two additional fields.
+- **Recommended fix:** Update README example JSON to match actual field names and types.
+
+---
+
+### Finding 4: P1 — demo_run.sh Uses Wrong Default Port
+
+- **File:** `scripts/demo_run.sh`, line 5
+- **What code says:** `API_BASE_URL="${JARSPECT_API_URL:-http://localhost:8000}"`
+- **What it should be:** Actual server default (`src/main.rs:130`) is `127.0.0.1:18000`.
+- **Recommended fix:** Change default to `http://localhost:18000`.
+
+---
+
+### Finding 5: P1 — Verdict Method Values Incomplete
+
+- **File:** `README.md`, line 600
+- **What docs say:** method values are `ai_verdict`, `malwarebazaar_hash`, `static_override(ai_verdict)`, or `heuristic_fallback`
+- **What code does:** `src/scan.rs` also produces `archive_fallback_static_override` and `archive_validation_failure` methods.
+- **Recommended fix:** Add `archive_fallback_static_override` and `archive_validation_failure` to documented method values.
+
+---
+
+### Finding 6: P2 — Missing Configuration Documentation
+
+- **File:** `README.md`, Configuration section (lines 364-401)
+- **Missing items:**
+  - `JARSPECT_WEB_DIR` env var (exists in `.env.example` and `src/main.rs:88-90`)
+  - `JARSPECT_MB_MATCH_CONTINUE_ANALYSIS` env var (exists in `.env.example` and `src/scan.rs:20`)
+  - `/health` endpoint missing `malwarebazaar_match_continue_analysis` and `yara_rulepacks_loaded` response fields
+- **Recommended fix:** Add missing env vars to config table. Add missing `/health` response fields.
+
+---
+
+### Finding 7: P2 — Azure OpenAI `deployment` Described as Required
+
+- **File:** `README.md`, lines 375-382
+- **What docs say:** `AZURE_OPENAI_DEPLOYMENT` listed under "AI Verdict (required for production)" with no indication it's optional.
+- **What code does:** `src/verdict.rs:18` — `deployment: Option<String>`. If `None`, entire AI config returns `None` (disabling AI).
+- **Recommended fix:** Add note that `AZURE_OPENAI_DEPLOYMENT` is required for AI verdicts. Without it, AI is disabled even if endpoint and key are set.
+
+---
+
+### Finding 8: P2 — Data Model Missing `sha256` Field
+
+- **File:** `README.md`, lines 580-591
+- **What docs say:** `ScanRunResponse` data model table does not include `sha256`.
+- **What code does:** `src/lib.rs:51` — `sha256: Option<String>` is a top-level field on `ScanRunResponse`. README text at line 70 mentions it, but the formal Data Model table omits it.
+- **Recommended fix:** Add `sha256 | string or null | SHA-256 hash of the uploaded JAR` to the Data Model table.
+
+---
+
+### Finding 9: P2 — Rust Edition Requirement Misleading
+
+- **File:** `README.md`, line 282
+- **What docs say:** "Prerequisites: Rust stable toolchain"
+- **What code does:** `Cargo.toml:4` — `edition = "2024"`, which requires Rust 1.85+ (2024 edition).
+- **Recommended fix:** Change to "Rust 1.85+ toolchain (2024 edition)".
+
+---
+
+### Finding 10: P3 — Incorrect Function Name in Pipeline Pseudocode
+
+- **File:** `README.md`, line 557
+- **What docs say:** `analysis::run_yara_scan()`
+- **What code does:** `src/analysis/mod.rs:12` — the function is `scan_yara_rulepacks()`
+- **Recommended fix:** Update to `analysis::scan_yara_rulepacks()`.
+
+---
+
+### Finding 11: P3 — Dead Code: `fallback_verdict()` Never Called
+
+- **File:** `src/verdict.rs`, line 523
+- **Description:** `pub fn fallback_verdict()` is exported but never called from anywhere in the codebase. The actual fallback path uses `heuristic_verdict()`.
+- **Recommended fix:** Remove `fallback_verdict()` or document it as utility for external consumers.
+
+---
+
+### Finding 12: P3 — Detector Count Misleading
+
+- **File:** `README.md`, lines 51, 75, 95
+- **What docs say:** "8 capability detectors"
+- **What code does:** `src/detectors/mod.rs:32-49` — `run_capability_detectors()` calls 11 functions (8 base + 3 compound: discord_webhook, base64_stager, remote_code_load).
+- **Recommended fix:** Update to "8 base capability detectors plus 3 compound detectors" or "11 capability detectors".
+
+---
+
+### Finding 13: P3 — Data Model Missing `static_findings` Field
+
+- **File:** `README.md`, lines 580-591
+- **What docs say:** Data Model table does not include `static_findings`.
+- **What code does:** `src/lib.rs:56` — `static_findings: Option<StaticFindings>` is a top-level field on `ScanRunResponse`.
+- **Recommended fix:** Add `static_findings` row to the Data Model table.
+
+---
+
+### Finding 14: P3 — Demo JAR Referenced as "Bundled" but is Build Artifact
+
+- **File:** `README.md`, line 610
+- **What docs say:** "The bundled demo rulepack matches strings from `demo/suspicious_sample.jar`"
+- **What exists:** `demo/suspicious_sample.jar` is NOT in the repository; it's generated by `demo/build_sample.sh`.
+- **Recommended fix:** Change to "The demo rulepack matches strings from `demo/suspicious_sample.jar` (built by `demo/build_sample.sh`)"
+
+---
 
-The README header and several sections reference "8 capability detectors." The codebase has 11 detector files: 8 base detectors (DETC-01 through DETC-08) and 3 compound detectors (base64_stager, discord_webhook, remote_code_load). The project layout section in the README correctly lists all 11, but the "8 detectors" phrasing in the pipeline description and Detection Engine sections is technically incomplete.
+### Finding 15: P3 — `upload_id` Description Imprecise
 
-**Impact:** Low. The compound detectors are mentioned separately in their own sections. The "8" refers specifically to the base capability detectors, which is accurate.
+- **File:** `README.md`, line 427
+- **What docs say:** "upload_id is a 32-character lowercase hex string (UUID v4, simple form)"
+- **What code does:** `src/main.rs:226` — `Uuid::new_v4().simple().to_string()` produces exactly this.
+- **Recommended fix:** No change needed — technically correct.
 
-**Recommendation:** Consider updating to "8 base + 3 compound detectors" for precision.
+---
 
-### No other drift detected
+## Prioritized Recommendations
 
-- All API routes documented in README match src/main.rs exactly
-- Configuration env vars documented match the code
-- Cargo.toml dependencies match README references
-- Project layout section accurately reflects actual file structure
-- Detection table (DETC-01 through DETC-08) matches detector files one-to-one
-- Benchmark figures and methodology are self-consistent
-- Version (0.1.0) matches Cargo.toml
+### Immediate (P0)
+1. Update `docs/corpus-calibration.md` — Replace `DETC-03.DYNAMIC_LOAD` with `DETC-03.BASE64_STAGER` and `DETC-02.DISCORD_WEBHOOK` (lines 49, 81).
+2. Fix README AI-dependency claim — Line 608 should state scans degrade to `heuristic_fallback`, not "fail with an error."
 
-## Summary
+### High Priority (P1)
+3. Fix `/health` response example — Update README to use actual field names.
+4. Fix `demo_run.sh` default port — Change `localhost:8000` to `localhost:18000`.
+5. Document all verdict methods — Add `archive_fallback_static_override` and `archive_validation_failure`.
 
-| Severity | Count | Details |
-|----------|-------|---------|
-| P0       | 0     | —       |
-| P1       | 0     | —       |
-| P2       | 0     | —       |
-| P3       | 1     | "8 detectors" undersells the 11 total |
+### Medium Priority (P2)
+6. Add missing config vars to README: `JARSPECT_WEB_DIR`, `JARSPECT_MB_MATCH_CONTINUE_ANALYSIS`.
+7. Add `sha256` and `static_findings` to the Data Model table.
+8. Update Rust prerequisite to "Rust 1.85+ (2024 edition)."
+9. Clarify Azure OpenAI deployment requirement.
 
-This is one of the best-documented repos in the Microck org. No meaningful drift.
+### Low Priority (P3)
+10. Fix function name in pipeline pseudocode.
+11. Remove or document dead `fallback_verdict()` code.
+12. Update detector count to reflect 11 total detectors.
+13. Fix demo JAR description.
diff --git a/scripts/demo_run.sh b/scripts/demo_run.sh
@@ -2,7 +2,7 @@
 set -euo pipefail
 
 ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
-API_BASE_URL="${JARSPECT_API_URL:-http://localhost:8000}"
+API_BASE_URL="${JARSPECT_API_URL:-http://localhost:18000}"
 JAR_PATH="${ROOT_DIR}/demo/suspicious_sample.jar"
 SERVER_PID=""
 SERVER_LOG=""