Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions LOGGING_AUDIT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Logging Audit Report

## Executive Summary

This report audits the logging quality and completeness across the Jarspect codebase.

## Current State

### Files with Logging
- `src/main.rs` - Uses `tracing::info!` (4 occurrences)
- `src/scan.rs` - Uses `tracing::warn!` (4 occurrences)
- `src/verdict.rs` - Uses `tracing::warn!` (7 occurrences)
- `src/analysis/archive.rs` - Uses `tracing::debug!` (6 occurrences)

### Files WITHOUT Logging
- `src/malwarebazaar.rs` - CRITICAL: HTTP operations unlogged
- `src/lib.rs` - CRITICAL: Config loading unlogged
- `src/analysis/yara.rs` - HIGH: Scans and matches unlogged
- `src/profile.rs` - MEDIUM: Profile building unlogged
- `src/detectors/*` (all) - MEDIUM: Detector execution unlogged

## Findings

### 1. MalwareBazaar Integration (CRITICAL)
**Issue**: No logging for external HTTP operations
**Impact**: Cannot debug hash lookup failures, API errors, or rate limits
**Recommendation**: Add structured logging for:
- Hash lookup attempts (info)
- API responses (debug)
- Failures (warn)
- Rate limiting events (warn)

### 2. Configuration Loading (CRITICAL)
**Issue**: No logging for signature and YARA rule loading
**Impact**: Cannot diagnose startup failures or misconfigurations
**Recommendation**: Add logging for:
- Signature files being loaded (info)
- Signature counts (info)
- YARA rule compilation (debug)
- Validation failures (warn)

### 3. YARA Scanning (HIGH)
**Issue**: No logging of scan operations or matches
**Impact**: Cannot track which rules are being evaluated
**Recommendation**: Add logging for:
- Scan start/end (debug)
- Matched rules (debug with rule ID, severity, file path)

### 4. Profile Building (MEDIUM)
**Issue**: No logging of metadata extraction
**Impact**: Cannot verify which mod format was detected
**Recommendation**: Add logging for:
- Detected loader (info)
- Metadata fields extracted (debug)
- Parsing failures (warn)

### 5. Detector Execution (MEDIUM)
**Issue**: No logging of detector runs
**Impact**: Cannot see which detectors found evidence
**Recommendation**: Add logging for:
- Detector start/end (debug)
- Evidence found counts (debug)
- Skip reasons (trace)

### 6. Scan Pipeline (HIGH)
**Issue**: Missing progress logging between scan stages
**Impact**: Cannot track where in the pipeline a scan is
**Recommendation**: Add logging for:
- Intake completion (info with file/class counts)
- Static analysis start (info)
- YARA scan start (info)
- Detector run start (info)
- Verdict determination (info)
- Scan completion (info with result)

### 7. Archive Analysis (LOW - GOOD)
**Current state**: Uses debug logging for budget limits
**Status**: Adequate for current needs
**No changes required**

## Logging Level Guidelines

### info
- Scan start/end with identifiers
- High-level stage transitions
- Configuration loading results
- External service calls (MalwareBazaar, AI API)
- Errors affecting scan outcome

### warn
- Retryable failures (with backoff details)
- Graceful degradation scenarios
- Non-critical parsing failures with fallbacks
- Rate limiting

### debug
- Detailed operation progress
- File/class counts
- Detector evidence found
- YARA rule matches
- Individual detector runs

### trace
- Very detailed diagnostics
- Skip reasons for optimization
- Individual record processing

## Structured Logging Fields

Use structured fields for consistency:

### Common fields
- `scan_id`: UUID for the scan operation
- `upload_id`: Upload identifier
- `file_count`: Number of files processed
- `class_count`: Number of .class files
- `jar_size_bytes`: Size of the JAR file
- `duration_ms`: Operation duration
- `error`: Error details
- `retry_count`: Number of retry attempts
- `backoff_ms`: Backoff duration

### MalwareBazaar-specific
- `sha256_hash`: Hash being looked up
- `family`: Malware family name (if found)
- `tags_count`: Number of tags
- `api_call_count`: Number of API calls made

### YARA-specific
- `rule_id`: YARA rule identifier
- `severity`: Rule severity
- `file_path`: Matched file path
- `rulepack`: rulepack kind (demo/prod)

### Detector-specific
- `detector_id`: Detector identifier
- `evidence_count`: Number of evidence items found
- `severity`: Finding severity
27 changes: 25 additions & 2 deletions src/analysis/yara.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use anyhow::Result;
use tracing::{debug, info};
use yara_x::{MetaValue, Rule, Rules, Scanner};

use crate::analysis::ArchiveEntry;
Expand Down Expand Up @@ -57,25 +58,47 @@ pub fn scan_yara_rulepacks(
packs: &[YaraRulepack],
) -> Result<Vec<(String, YaraFinding)>> {
let mut findings = Vec::new();
let total_entries = entries.len();

for pack in packs {
debug!(
pack = %pack.kind.as_str(),
entries_count = total_entries,
"YARA scan started"
);

for entry in entries {
let mut scanner = Scanner::new(&pack.rules);
let scan_results = scanner.scan(entry.bytes.as_slice())?;
for rule in scan_results.matching_rules() {
let severity = derive_rule_severity(&rule, pack.kind);
debug!(
rule_id = rule.identifier(),
severity,
pack = %pack.kind.as_str(),
entry_path = %entry.path,
"YARA rule matched"
);
findings.push((
entry.path.clone(),
YaraFinding {
pack: pack.kind,
rule_identifier: rule.identifier().to_string(),
severity: derive_rule_severity(&rule, pack.kind).to_string(),
severity: severity.to_string(),
evidence: build_match_evidence(&rule),
},
));
}
}

debug!(
pack = %pack.kind.as_str(),
matches_count = findings.len(),
"YARA scan complete"
);
}

info!(total_findings = findings.len(), "YARA scan complete");
Ok(findings)
}

Expand Down Expand Up @@ -174,7 +197,7 @@ fn build_match_evidence(rule: &Rule<'_, '_>) -> String {
mod tests {
use yara_x::Compiler;

use super::{RulepackKind, YaraRulepack, scan_yara_rulepacks};
use super::{scan_yara_rulepacks, RulepackKind, YaraRulepack};
use crate::analysis::ArchiveEntry;

#[test]
Expand Down
15 changes: 14 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use std::sync::Arc;
use anyhow::{Context, Result};
use regex::Regex;
use serde::{Deserialize, Serialize};
use tracing::{debug, info, warn};
use yara_x::Compiler;

pub mod analysis;
Expand Down Expand Up @@ -444,13 +445,17 @@ pub fn load_signatures(

for pack in packs {
let path = signatures_path_for_pack(cwd, pack.as_str());
debug!(path = %path.display(), pack = %pack.as_str(), "loading signature corpus");
let payload = std::fs::read_to_string(&path)
.with_context(|| format!("Failed to read signature corpus: {}", path.display()))?;
let mut parsed: Vec<SignatureDefinition> = serde_json::from_str(&payload)
.with_context(|| format!("Invalid signature JSON: {}", path.display()))?;
let count = parsed.len();
signatures.append(&mut parsed);
info!(pack = %pack.as_str(), count, "loaded signature corpus");
}

info!(total_signatures = signatures.len(), "signature loading complete");
Ok(signatures)
}

Expand All @@ -462,18 +467,26 @@ pub fn load_yara_rules(

for pack in packs {
let path = yara_path_for_pack(cwd, pack.as_str());
debug!(path = %path.display(), pack = %pack.as_str(), "loading YARA rules");
let source = std::fs::read_to_string(&path)
.with_context(|| format!("Failed to read YARA rules: {}", path.display()))?;
let mut compiler = Compiler::new();
compiler
.add_source(source.as_str())
.with_context(|| format!("Failed compiling YARA rules from {}", path.display()))?;
let rules = compiler.build();
let rule_count = rules.iter().count();
loaded_rulepacks.push(analysis::YaraRulepack {
kind: *pack,
rules: compiler.build(),
rules,
});
info!(pack = %pack.as_str(), rule_count, "compiled YARA rules");
}

info!(
total_rulepacks = loaded_rulepacks.len(),
"YARA rule loading complete"
);
Ok(loaded_rulepacks)
}

Expand Down
14 changes: 12 additions & 2 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ use axum::{Json, Router};
use serde_json::Value;
use tokio::fs;
use tower_http::services::ServeDir;
use tracing::info;
use tracing::{debug, info};
use uuid::Uuid;

use jarspect::verdict::AiConfig;
Expand Down Expand Up @@ -192,6 +192,9 @@ async fn upload(
State(state): State<AppState>,
mut multipart: Multipart,
) -> Result<Json<Value>, AppError> {
let upload_id = Uuid::new_v4().simple().to_string();
info!(upload_id = %upload_id, "upload request received");

let mut filename = None;
let mut bytes = None;

Expand Down Expand Up @@ -221,12 +224,18 @@ async fn upload(
}
let content = bytes.ok_or_else(|| AppError::bad_request("Missing upload file bytes"))?;

let upload_id = Uuid::new_v4().simple().to_string();
let output_path = state.uploads_dir.join(format!("{upload_id}.jar"));
fs::write(&output_path, &content)
.await
.map_err(|e| AppError::internal(format!("Failed to persist upload: {e}")))?;

info!(
upload_id = %upload_id,
filename,
size_bytes = content.len(),
"upload persisted"
);

Ok(Json(serde_json::json!({
"upload_id": upload_id,
"filename": filename,
Expand All @@ -239,6 +248,7 @@ async fn scan(
State(state): State<AppState>,
Json(request): Json<ScanRequest>,
) -> Result<Json<ScanRunResponse>, AppError> {
info!(upload_id = %request.upload_id, "scan request received");
match jarspect::run_scan(&state, request, None).await {
Ok(scan_payload) => Ok(Json(scan_payload)),
Err(error) => Err(map_scan_error(error)),
Expand Down
25 changes: 24 additions & 1 deletion src/malwarebazaar.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
use std::time::Duration;
use std::time::{Duration, Instant};

use anyhow::{Context, Result};
use reqwest::Client;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use tracing::{debug, info, warn};

const MALWAREBAZAAR_API_URL: &str = "https://mb-api.abuse.ch/api/v1/";

Expand Down Expand Up @@ -42,6 +43,9 @@ pub async fn check_hash(sha256: &str, api_key: &str) -> Result<Option<MalwareBaz
.build()
.context("failed to build MalwareBazaar HTTP client")?;

let started = Instant::now();
info!(sha256, "MalwareBazaar hash lookup started");

let response = client
.post(MALWAREBAZAAR_API_URL)
.header("Auth-Key", api_key)
Expand All @@ -50,6 +54,9 @@ pub async fn check_hash(sha256: &str, api_key: &str) -> Result<Option<MalwareBaz
.await
.context("MalwareBazaar request failed")?;

let status = response.status();
debug!(sha256, %status, "MalwareBazaar response received");

let payload: Value = response
.json()
.await
Expand All @@ -60,11 +67,15 @@ pub async fn check_hash(sha256: &str, api_key: &str) -> Result<Option<MalwareBaz
.and_then(Value::as_str)
.unwrap_or_default();

debug!(sha256, query_status, "MalwareBazaar query status");

if query_status.eq_ignore_ascii_case("hash_not_found") {
info!(sha256, duration_ms = started.elapsed().as_millis(), "MalwareBazaar: hash not found");
return Ok(None);
}

if !query_status.eq_ignore_ascii_case("ok") {
warn!(sha256, query_status, "MalwareBazaar: unexpected query status");
return Ok(None);
}

Expand All @@ -76,9 +87,21 @@ pub async fn check_hash(sha256: &str, api_key: &str) -> Result<Option<MalwareBaz
.unwrap_or(Value::Null);

if entry.is_null() {
info!(sha256, duration_ms = started.elapsed().as_millis(), "MalwareBazaar: no data in response");
return Ok(None);
}

let family = value_string(&entry, "signature");
let tags_count = value_string_array(&entry, "tags").len();

info!(
sha256,
family,
tags_count,
duration_ms = started.elapsed().as_millis(),
"MalwareBazaar: known malware detected"
);

Ok(Some(MalwareBazaarResult {
sha256_hash: value_string(&entry, "sha256_hash").unwrap_or_else(|| sha256.to_string()),
family: value_string(&entry, "signature"),
Expand Down
Loading
Loading