feat: support exporting compliance audit logs as CSV#438
Conversation
|
Warning Review limit reached
More reviews will be available in 22 minutes and 7 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds end-to-end CSV export for governance audit logs. ChangesCSV Compliance Export
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as agentwatch CLI
participant API as FastAPI server
participant Engine as GovernanceEngine
participant Reporter as ComplianceReporter
participant Disk as JSONL audit log
Note over Engine,Disk: Server startup
Engine->>Disk: load_from_disk() reads JSONL
Disk-->>Engine: AuditEntry list
Note over User,Disk: export-csv (API-backed)
User->>CLI: compliance export-csv --include-allowed
CLI->>API: GET /api/v1/compliance/audit-log?include_allowed=true
API->>Reporter: generate_csv(include_allowed=True)
Reporter->>Engine: get audit entries
Engine-->>Reporter: AuditEntry list
Reporter-->>API: CSV string
API-->>CLI: PlainTextResponse (CSV)
CLI-->>User: CSV output
Note over User,Disk: export-local (file-backed)
User->>CLI: compliance export-local --output report.csv
CLI->>Engine: load_from_disk(audit_log_path)
Engine->>Disk: read JSONL
Disk-->>Engine: AuditEntry list
CLI->>Reporter: generate_csv(include_allowed)
Reporter-->>CLI: CSV string
CLI->>Disk: write report.csv
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@arcgod-design Thanks for the contribution. Before merge, I'd like some clarification around the export source. The CSV export functionality itself looks useful, but I want to ensure the exported report reflects actual compliance/audit findings rather than data generated from a fresh local compliance engine at export time. Please clarify:
Also, let's wait for the remaining CI checks to complete before making a merge decision. Once that's clarified and CI is green, I'll take another look. |
🧪 PR Test Results
Python 3.12 · commit 94f9a45 |
…add CSV format to API endpoint - GovernanceEngine: add load_from_disk() to reload persisted JSONL entries - Server: enable audit_log_path with AGENTWATCH_AUDIT_LOG_PATH env var - Server: add ?format=csv to compliance-report endpoint + dedicated /api/v1/compliance/audit-log - CLI export-local: load from JSONL file instead of empty engine, add --audit-log option - CLI export-csv: hit correct endpoint (/api/v1/governance/compliance-report) Closes sreerevanth#148
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
tests/test_compliance.py (2)
248-315: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winAdd a persisted-audit-log CSV test to lock the audit-source contract.
Current tests validate in-memory entries only. They don’t cover the key contract that CSV export reflects persisted JSONL audit history (the main maintainer concern).
Suggested test addition
+import json @@ def test_compliance_reporter_generate_csv_includes_allowed(): @@ actions = [row[5] for row in rows[1:]] assert "updated" in actions + + +def test_compliance_reporter_generate_csv_uses_persisted_audit_log(tmp_path): + audit_path = tmp_path / "audit.jsonl" + persisted = { + "audit_id": "persisted-1", + "timestamp": datetime.now(UTC).isoformat(), + "principal_id": "u1", + "event_type": "permission_denied", + "resource": "policy", + "action": "safety:override", + "allowed": False, + "session_id": "", + "details": {}, + } + audit_path.write_text(json.dumps(persisted) + "\n", encoding="utf-8") + + engine = GovernanceEngine(audit_log_path=audit_path) + reporter = ComplianceReporter(engine) + rows = list(csv.DictReader(io.StringIO(reporter.generate_csv(include_allowed=True)))) + + assert len(rows) == 1 + assert rows[0]["audit_id"] == "persisted-1" + assert rows[0]["event_type"] == "permission_denied"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_compliance.py` around lines 248 - 315, Add a new test function that validates CSV export from persisted JSONL audit history, not just in-memory entries. The test should create a ComplianceReporter with a GovernanceEngine that has been initialized from a persisted JSONL audit log file, then call generate_csv() and verify the output contains the persisted audit entries with correct format and fields. This locks the contract that CSV export accurately reflects the persisted audit source data, which is the main concern for audit compliance tracking.
274-315: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winTighten CSV assertions to avoid false positives.
A few checks are currently too permissive (
"permission_denied" in csv_output,len(rows) >= ...). With a deterministic fixture, column-level checks and stricter row expectations will catch regressions earlier.Suggested assertion hardening
def test_compliance_report_to_csv_contains_denials(): @@ report = reporter.generate() csv_output = report.to_csv() - assert "permission_denied" in csv_output + rows = list(csv.DictReader(io.StringIO(csv_output))) + assert any(row["event_type"] == "permission_denied" for row in rows) @@ def test_compliance_reporter_generate_csv_denials_only(): @@ rows = list(reader) - assert len(rows) >= 2 + assert len(rows) == 3 # header + 2 denied rows from fixture for row in rows[1:]: - assert "permission_denied" in row[3] or "safety_override" in row[3] + assert row[3] == "permission_denied" + assert row[6] == "False" @@ def test_compliance_reporter_generate_csv_includes_allowed(): @@ rows = list(reader) - assert len(rows) >= 3 + assert len(rows) == 4 # header + 3 total rows from fixture actions = [row[5] for row in rows[1:]] assert "updated" in actions🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_compliance.py` around lines 274 - 315, The CSV assertions in these test functions are too permissive and may not catch regressions effectively. In test_compliance_report_to_csv_contains_denials, replace the broad string check with CSV parsing and column-level validation to verify the denial appears in the correct column. In test_compliance_reporter_generate_csv_denials_only, replace the >= comparison with an exact row count assertion and check specific columns (like column index 3 for event type) instead of searching for the string anywhere in the row. In test_compliance_reporter_generate_csv_includes_allowed, similarly replace len(rows) >= 3 with an exact equality check and validate that specific columns contain the expected values rather than just checking if "updated" exists in a loosely defined list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agentwatch/api/server.py`:
- Around line 1129-1133: The `format` parameter in the compliance_audit_log_csv
function is accepted as a Query parameter but is never actually used within the
function implementation, making the API contract misleading. Remove the unused
`format: str = Query("csv", alias="format")` parameter from the function
signature of compliance_audit_log_csv to clarify that the endpoint only returns
CSV format and eliminate the ambiguity about whether the parameter is respected.
In `@agentwatch/cli/main.py`:
- Around line 1927-1939: The reported entry count in the success message at the
end of the command execution is inaccurate because `loaded =
len(engine._audit_log)` counts all audit entries, but when `include_allowed` is
False, the CSV generated by
`reporter.generate_csv(include_allowed=include_allowed)` filters to only denials
and overrides. Additionally, the code accesses the private `_audit_log`
attribute instead of using the public API. Calculate the actual count of entries
that were exported by counting the entries in the `csv_content` returned by
generate_csv (accounting for the header row), and use that count in the success
message console.print statement to accurately reflect what was written to the
file. Optionally, replace the direct access to `engine._audit_log` with the
public `engine.stats()` method if available.
In `@agentwatch/governance/compliance_reporter.py`:
- Around line 95-97: The generate_csv() method hard-codes a limit of 10_000 rows
in the self._governance.get_audit_log(limit=10_000) call, which silently
truncates audit records and produces incomplete compliance CSV exports without
user awareness. Remove the hard-coded limit parameter to retrieve all available
records, or if a limit is necessary for performance, implement pagination to
fetch all records in batches or add a clear warning message when results are
truncated so users know the export may be incomplete.
In `@agentwatch/governance/engine.py`:
- Around line 199-215: Valid JSON lines that are not objects (such as arrays or
primitives) will raise a TypeError when attempting to access fields using
bracket notation (e.g., data["audit_id"]), but this TypeError is not currently
caught by the exception handler. Add TypeError to the exception tuple in the
except clause alongside json.JSONDecodeError, KeyError, and ValueError so that
these corrupt entries are properly logged as warnings instead of causing the
load to crash unexpectedly.
---
Nitpick comments:
In `@tests/test_compliance.py`:
- Around line 248-315: Add a new test function that validates CSV export from
persisted JSONL audit history, not just in-memory entries. The test should
create a ComplianceReporter with a GovernanceEngine that has been initialized
from a persisted JSONL audit log file, then call generate_csv() and verify the
output contains the persisted audit entries with correct format and fields. This
locks the contract that CSV export accurately reflects the persisted audit
source data, which is the main concern for audit compliance tracking.
- Around line 274-315: The CSV assertions in these test functions are too
permissive and may not catch regressions effectively. In
test_compliance_report_to_csv_contains_denials, replace the broad string check
with CSV parsing and column-level validation to verify the denial appears in the
correct column. In test_compliance_reporter_generate_csv_denials_only, replace
the >= comparison with an exact row count assertion and check specific columns
(like column index 3 for event type) instead of searching for the string
anywhere in the row. In test_compliance_reporter_generate_csv_includes_allowed,
similarly replace len(rows) >= 3 with an exact equality check and validate that
specific columns contain the expected values rather than just checking if
"updated" exists in a loosely defined list.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: bcbb464b-a004-4db7-955d-4e0c7e57b3ac
📒 Files selected for processing (5)
agentwatch/api/server.pyagentwatch/cli/main.pyagentwatch/governance/compliance_reporter.pyagentwatch/governance/engine.pytests/test_compliance.py
| async def compliance_audit_log_csv( | ||
| _auth: None = Depends(_require_api_key), | ||
| format: str = Query("csv", alias="format"), | ||
| include_allowed: bool = Query(False), | ||
| ): |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Remove or enforce format on /api/v1/compliance/audit-log.
format is accepted but ignored; the handler always returns CSV. This makes the API contract ambiguous.
🔧 Proposed fix (remove unused parameter)
async def compliance_audit_log_csv(
_auth: None = Depends(_require_api_key),
- format: str = Query("csv", alias="format"),
include_allowed: bool = Query(False),
):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def compliance_audit_log_csv( | |
| _auth: None = Depends(_require_api_key), | |
| format: str = Query("csv", alias="format"), | |
| include_allowed: bool = Query(False), | |
| ): | |
| async def compliance_audit_log_csv( | |
| _auth: None = Depends(_require_api_key), | |
| include_allowed: bool = Query(False), | |
| ): |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agentwatch/api/server.py` around lines 1129 - 1133, The `format` parameter in
the compliance_audit_log_csv function is accepted as a Query parameter but is
never actually used within the function implementation, making the API contract
misleading. Remove the unused `format: str = Query("csv", alias="format")`
parameter from the function signature of compliance_audit_log_csv to clarify
that the endpoint only returns CSV format and eliminate the ambiguity about
whether the parameter is respected.
| engine = GovernanceEngine(audit_log_path=audit_log) | ||
| loaded = len(engine._audit_log) | ||
| if loaded == 0: | ||
| console.print( | ||
| f"[yellow]No audit entries found in {audit_log}. " | ||
| "The server persists audit data when AGENTWATCH_AUDIT_LOG_PATH is set.[/yellow]" | ||
| ) | ||
| reporter = ComplianceReporter(engine) | ||
| csv_content = reporter.generate_csv(include_allowed=include_allowed) | ||
|
|
||
| with open(output, "w", encoding="utf-8", newline="") as f: | ||
| f.write(csv_content) | ||
| console.print(f"[green]{output} created successfully ({loaded} entries exported)[/green]") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Reported export count is inaccurate and relies on a private attribute.
loaded = len(engine._audit_log) counts every loaded entry, but when include_allowed is False (the default) generate_csv filters down to denials/overrides only. The success message then reports loaded as "entries exported", overstating what was actually written — misleading in an audit context. Additionally, reading engine._audit_log reaches into a private attribute; engine.stats() exposes the same count publicly.
Compute the count from the actual CSV output and use the public API:
🐛 Proposed fix
engine = GovernanceEngine(audit_log_path=audit_log)
- loaded = len(engine._audit_log)
+ loaded = engine.stats()["total_audit_entries"]
if loaded == 0:
console.print(
f"[yellow]No audit entries found in {audit_log}. "
"The server persists audit data when AGENTWATCH_AUDIT_LOG_PATH is set.[/yellow]"
)
reporter = ComplianceReporter(engine)
csv_content = reporter.generate_csv(include_allowed=include_allowed)
+ exported = max(csv_content.count("\n") - 1, 0) # rows excluding header
with open(output, "w", encoding="utf-8", newline="") as f:
f.write(csv_content)
- console.print(f"[green]{output} created successfully ({loaded} entries exported)[/green]")
+ console.print(f"[green]{output} created successfully ({exported} entries exported)[/green]")🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 1936-1936: File path is request-/variable-derived; validate and normalize to prevent path traversal.
Context: open(output, "w", encoding="utf-8", newline="")
Note: [CWE-22] Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal').
(open-filename-from-request)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agentwatch/cli/main.py` around lines 1927 - 1939, The reported entry count in
the success message at the end of the command execution is inaccurate because
`loaded = len(engine._audit_log)` counts all audit entries, but when
`include_allowed` is False, the CSV generated by
`reporter.generate_csv(include_allowed=include_allowed)` filters to only denials
and overrides. Additionally, the code accesses the private `_audit_log`
attribute instead of using the public API. Calculate the actual count of entries
that were exported by counting the entries in the `csv_content` returned by
generate_csv (accounting for the header row), and use that count in the success
message console.print statement to accurately reflect what was written to the
file. Optionally, replace the direct access to `engine._audit_log` with the
public `engine.stats()` method if available.
| audit_log = self._governance.get_audit_log(limit=10_000) | ||
| if not include_allowed: | ||
| audit_log = [entry for entry in audit_log if not entry.allowed] |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Avoid silent truncation of compliance CSV exports.
generate_csv() hard-caps audit retrieval at 10,000 rows, which can silently drop persisted records and produce incomplete compliance artifacts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agentwatch/governance/compliance_reporter.py` around lines 95 - 97, The
generate_csv() method hard-codes a limit of 10_000 rows in the
self._governance.get_audit_log(limit=10_000) call, which silently truncates
audit records and produces incomplete compliance CSV exports without user
awareness. Remove the hard-coded limit parameter to retrieve all available
records, or if a limit is necessary for performance, implement pagination to
fetch all records in batches or add a clear warning message when results are
truncated so users know the export may be incomplete.
| try: | ||
| data = json.loads(line) | ||
| entry = AuditEntry( | ||
| audit_id=data["audit_id"], | ||
| timestamp=datetime.fromisoformat(data["timestamp"]), | ||
| principal_id=data.get("principal_id"), | ||
| event_type=AuditEventType(data["event_type"]), | ||
| resource=data["resource"], | ||
| action=data["action"], | ||
| allowed=data["allowed"], | ||
| session_id=data.get("session_id"), | ||
| details=data.get("details", {}), | ||
| ) | ||
| self._audit_log.append(entry) | ||
| count += 1 | ||
| except (json.JSONDecodeError, KeyError, ValueError) as exc: | ||
| logger.warning("Skipping corrupt audit entry: %s", exc) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Handle non-object JSONL entries as corrupt rows instead of crashing load.
Valid JSON lines that are not objects can raise TypeError during field extraction and currently bypass the inner handler, which can abort startup loading unexpectedly.
🔧 Proposed fix
for line in f:
line = line.strip()
if not line:
continue
try:
data = json.loads(line)
+ if not isinstance(data, dict):
+ raise ValueError("audit entry must be a JSON object")
entry = AuditEntry(
audit_id=data["audit_id"],
timestamp=datetime.fromisoformat(data["timestamp"]),
principal_id=data.get("principal_id"),
event_type=AuditEventType(data["event_type"]),
resource=data["resource"],
action=data["action"],
allowed=data["allowed"],
session_id=data.get("session_id"),
details=data.get("details", {}),
)
self._audit_log.append(entry)
count += 1
- except (json.JSONDecodeError, KeyError, ValueError) as exc:
+ except (json.JSONDecodeError, KeyError, TypeError, ValueError) as exc:
logger.warning("Skipping corrupt audit entry: %s", exc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@agentwatch/governance/engine.py` around lines 199 - 215, Valid JSON lines
that are not objects (such as arrays or primitives) will raise a TypeError when
attempting to access fields using bracket notation (e.g., data["audit_id"]), but
this TypeError is not currently caught by the exception handler. Add TypeError
to the exception tuple in the except clause alongside json.JSONDecodeError,
KeyError, and ValueError so that these corrupt entries are properly logged as
warnings instead of causing the load to crash unexpectedly.
…t, catch TypeError in audit log loader (closes sreerevanth#148)
Where the exported records originate from: The CSV export reads directly from the persisted JSONL audit log file on disk. When the server runs with AGENTWATCH_AUDIT_LOG_PATH set, every compliance decision (denials, overrides, approvals) is appended to that JSONL file in real-time. The GovernanceEngine loads this file into memory on init, and the ComplianceReporter.generate_csv() iterates over those loaded entries to produce the CSV. CLI vs on-demand: Both paths export the same persisted history — there's no synthetic data generation. The CLI export-local command (agentwatch compliance export-local --audit-log ) reads from the same JSONL file the server writes to. The API endpoint /api/v1/compliance/audit-log does the same thing but serves it as an HTTP response. Neither path creates new audit entries — they only serialize what's already been recorded. How local export maps to real audit findings: The JSONL file is the single source of truth. Each line is a JSON object with audit_id, timestamp, principal_id, event_type, resource, action, allowed, and details. The CSV is a 1:1 row-per-entry mapping of this data. When --include-allowed is false (default), only denials and overrides are exported — the same entries that would appear in a compliance review. The key invariant: if it's not in the JSONL file, it won't be in the CSV. The export is purely a format conversion of persisted audit records. |
Summary
Changes
Testing
All 5 CSV tests pass:
Closes
closes #148
Summary by CodeRabbit
New Features