Skip to content

feat: support exporting compliance audit logs as CSV#438

Open
arcgod-design wants to merge 6 commits into
sreerevanth:mainfrom
arcgod-design:feat/issue-148-csv-export
Open

feat: support exporting compliance audit logs as CSV#438
arcgod-design wants to merge 6 commits into
sreerevanth:mainfrom
arcgod-design:feat/issue-148-csv-export

Conversation

@arcgod-design

@arcgod-design arcgod-design commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds \ o_csv()\ method to \ComplianceReport\ for CSV export of audit denials
  • Adds \generate_csv()\ method to \ComplianceReporter\ with \include_allowed\ option
  • Adds CLI commands: \compliance export-csv\ (remote API) and \compliance export-local\ (local engine)
  • Adds 5 new tests for CSV export functionality

Changes

  • \�gentwatch/governance/compliance_reporter.py: Added \ o_csv()\ and \generate_csv()\ methods
  • \�gentwatch/cli/main.py: Added \compliance_app\ with \export-csv\ and \export-local\ commands
  • \ ests/test_compliance.py: Added 5 CSV export tests

Testing

All 5 CSV tests pass:

  • \ est_compliance_report_to_csv_has_header\
  • \ est_compliance_report_to_csv_contains_denials\
  • \ est_compliance_report_to_csv_empty_when_no_denials\
  • \ est_compliance_reporter_generate_csv_denials_only\
  • \ est_compliance_reporter_generate_csv_includes_allowed\

Closes

closes #148

Summary by CodeRabbit

New Features

  • Added CSV export format for compliance reports via API
  • New CLI commands to export compliance data locally or from running API
  • Option to include or exclude allowed actions in compliance report exports
  • Automatic loading of audit logs from persisted storage on startup

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@arcgod-design, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 7 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 587c8e26-fd66-4a39-b239-1f13a57cc047

📥 Commits

Reviewing files that changed from the base of the PR and between 9f2bc99 and 94f9a45.

📒 Files selected for processing (4)
  • agentwatch/api/server.py
  • agentwatch/cli/main.py
  • agentwatch/governance/compliance_reporter.py
  • agentwatch/governance/engine.py
📝 Walkthrough

Walkthrough

Adds end-to-end CSV export for governance audit logs. GovernanceEngine gains a load_from_disk() method called at startup to restore persisted JSONL entries. ComplianceReport and ComplianceReporter gain to_csv() and generate_csv() methods. Two API endpoints are updated or added to serve CSV or JSON. A new compliance CLI group exposes export-csv (API-backed) and export-local (file-backed) commands.

Changes

CSV Compliance Export

Layer / File(s) Summary
GovernanceEngine audit log disk persistence
agentwatch/governance/engine.py, agentwatch/api/server.py
GovernanceEngine.__init__ calls load_from_disk() when an audit_log_path is set, reading and parsing persisted JSONL entries at startup. The server now constructs GovernanceEngine with a path from the AGENTWATCH_AUDIT_LOG_PATH env variable.
ComplianceReport.to_csv and ComplianceReporter.generate_csv
agentwatch/governance/compliance_reporter.py
Adds csv/io imports. ComplianceReport.to_csv() serializes findings["sample_denials"] into a CSV string. ComplianceReporter.generate_csv(include_allowed) fetches audit entries, optionally filters to denials, and returns a flat CSV string per entry.
Updated and new compliance API endpoints
agentwatch/api/server.py
GET /api/v1/governance/compliance-report gains format and include_allowed query params and returns CSV or JSON. A new GET /api/v1/compliance/audit-log endpoint returns CSV from generate_csv().
CLI compliance command group
agentwatch/cli/main.py
New compliance Typer app registered on the main CLI. export-csv fetches CSV from the API; export-local loads a local JSONL file via GovernanceEngine, generates CSV via ComplianceReporter, and writes to disk.
CSV export tests
tests/test_compliance.py
_make_engine_with_entries() seeds a GovernanceEngine. Tests cover to_csv() header presence, denial row content, empty output when no denials, and generate_csv() filtering with and without allowed entries.
CLI cosmetic multiline reformats
agentwatch/cli/main.py
Formatting-only refactors of Panel, Table, and Live construction across top, status, compare, and session_prune commands. No behavioral changes.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant CLI as agentwatch CLI
  participant API as FastAPI server
  participant Engine as GovernanceEngine
  participant Reporter as ComplianceReporter
  participant Disk as JSONL audit log

  Note over Engine,Disk: Server startup
  Engine->>Disk: load_from_disk() reads JSONL
  Disk-->>Engine: AuditEntry list

  Note over User,Disk: export-csv (API-backed)
  User->>CLI: compliance export-csv --include-allowed
  CLI->>API: GET /api/v1/compliance/audit-log?include_allowed=true
  API->>Reporter: generate_csv(include_allowed=True)
  Reporter->>Engine: get audit entries
  Engine-->>Reporter: AuditEntry list
  Reporter-->>API: CSV string
  API-->>CLI: PlainTextResponse (CSV)
  CLI-->>User: CSV output

  Note over User,Disk: export-local (file-backed)
  User->>CLI: compliance export-local --output report.csv
  CLI->>Engine: load_from_disk(audit_log_path)
  Engine->>Disk: read JSONL
  Disk-->>Engine: AuditEntry list
  CLI->>Reporter: generate_csv(include_allowed)
  Reporter-->>CLI: CSV string
  CLI->>Disk: write report.csv
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

Hard, SSoC26

Poem

🐇 Hop hop, the audit trail grows long,
In CSV rows where data belongs.
From JSONL files to flat-table grace,
Each denial logged now finds its place.
The rabbit checks columns and headers with glee —
Compliance exports, as tidy as can be! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: support exporting compliance audit logs as CSV' clearly and concisely summarizes the main change in the PR—adding CSV export functionality for compliance audit logs.
Linked Issues check ✅ Passed The PR successfully implements all requirements from issue #148: CSV export methods in ComplianceReport and ComplianceReporter, two CLI commands for compliance export (remote and local), and comprehensive testing coverage.
Out of Scope Changes check ✅ Passed All changes directly support CSV export functionality. Minor formatting changes in CLI dashboards are incidental to the primary feature and do not constitute problematic scope expansion.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@sreerevanth

Copy link
Copy Markdown
Owner

@arcgod-design Thanks for the contribution.

Before merge, I'd like some clarification around the export source.

The CSV export functionality itself looks useful, but I want to ensure the exported report reflects actual compliance/audit findings rather than data generated from a fresh local compliance engine at export time.

Please clarify:

  • Where the exported records originate from
  • Whether the CLI exports persisted compliance history or generates a new report on demand
  • How the local export path maps to real audit findings

Also, let's wait for the remaining CI checks to complete before making a merge decision.

Once that's clarified and CI is green, I'll take another look.

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

🧪 PR Test Results

Check Result
Tests (pytest tests/) ✅ success
Lint (ruff check .) ❌ failure
Coverage (agentwatch) 74.61%

Python 3.12 · commit 94f9a45

arcgod-design added 3 commits June 23, 2026 22:32
…add CSV format to API endpoint

- GovernanceEngine: add load_from_disk() to reload persisted JSONL entries
- Server: enable audit_log_path with AGENTWATCH_AUDIT_LOG_PATH env var
- Server: add ?format=csv to compliance-report endpoint + dedicated /api/v1/compliance/audit-log
- CLI export-local: load from JSONL file instead of empty engine, add --audit-log option
- CLI export-csv: hit correct endpoint (/api/v1/governance/compliance-report)

Closes sreerevanth#148

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
tests/test_compliance.py (2)

248-315: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Add a persisted-audit-log CSV test to lock the audit-source contract.

Current tests validate in-memory entries only. They don’t cover the key contract that CSV export reflects persisted JSONL audit history (the main maintainer concern).

Suggested test addition
+import json
@@
 def test_compliance_reporter_generate_csv_includes_allowed():
@@
     actions = [row[5] for row in rows[1:]]
     assert "updated" in actions
+
+
+def test_compliance_reporter_generate_csv_uses_persisted_audit_log(tmp_path):
+    audit_path = tmp_path / "audit.jsonl"
+    persisted = {
+        "audit_id": "persisted-1",
+        "timestamp": datetime.now(UTC).isoformat(),
+        "principal_id": "u1",
+        "event_type": "permission_denied",
+        "resource": "policy",
+        "action": "safety:override",
+        "allowed": False,
+        "session_id": "",
+        "details": {},
+    }
+    audit_path.write_text(json.dumps(persisted) + "\n", encoding="utf-8")
+
+    engine = GovernanceEngine(audit_log_path=audit_path)
+    reporter = ComplianceReporter(engine)
+    rows = list(csv.DictReader(io.StringIO(reporter.generate_csv(include_allowed=True))))
+
+    assert len(rows) == 1
+    assert rows[0]["audit_id"] == "persisted-1"
+    assert rows[0]["event_type"] == "permission_denied"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_compliance.py` around lines 248 - 315, Add a new test function
that validates CSV export from persisted JSONL audit history, not just in-memory
entries. The test should create a ComplianceReporter with a GovernanceEngine
that has been initialized from a persisted JSONL audit log file, then call
generate_csv() and verify the output contains the persisted audit entries with
correct format and fields. This locks the contract that CSV export accurately
reflects the persisted audit source data, which is the main concern for audit
compliance tracking.

274-315: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Tighten CSV assertions to avoid false positives.

A few checks are currently too permissive ("permission_denied" in csv_output, len(rows) >= ...). With a deterministic fixture, column-level checks and stricter row expectations will catch regressions earlier.

Suggested assertion hardening
 def test_compliance_report_to_csv_contains_denials():
@@
     report = reporter.generate()
     csv_output = report.to_csv()
-    assert "permission_denied" in csv_output
+    rows = list(csv.DictReader(io.StringIO(csv_output)))
+    assert any(row["event_type"] == "permission_denied" for row in rows)
@@
 def test_compliance_reporter_generate_csv_denials_only():
@@
     rows = list(reader)
-    assert len(rows) >= 2
+    assert len(rows) == 3  # header + 2 denied rows from fixture
     for row in rows[1:]:
-        assert "permission_denied" in row[3] or "safety_override" in row[3]
+        assert row[3] == "permission_denied"
+        assert row[6] == "False"
@@
 def test_compliance_reporter_generate_csv_includes_allowed():
@@
     rows = list(reader)
-    assert len(rows) >= 3
+    assert len(rows) == 4  # header + 3 total rows from fixture
     actions = [row[5] for row in rows[1:]]
     assert "updated" in actions
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_compliance.py` around lines 274 - 315, The CSV assertions in these
test functions are too permissive and may not catch regressions effectively. In
test_compliance_report_to_csv_contains_denials, replace the broad string check
with CSV parsing and column-level validation to verify the denial appears in the
correct column. In test_compliance_reporter_generate_csv_denials_only, replace
the >= comparison with an exact row count assertion and check specific columns
(like column index 3 for event type) instead of searching for the string
anywhere in the row. In test_compliance_reporter_generate_csv_includes_allowed,
similarly replace len(rows) >= 3 with an exact equality check and validate that
specific columns contain the expected values rather than just checking if
"updated" exists in a loosely defined list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agentwatch/api/server.py`:
- Around line 1129-1133: The `format` parameter in the compliance_audit_log_csv
function is accepted as a Query parameter but is never actually used within the
function implementation, making the API contract misleading. Remove the unused
`format: str = Query("csv", alias="format")` parameter from the function
signature of compliance_audit_log_csv to clarify that the endpoint only returns
CSV format and eliminate the ambiguity about whether the parameter is respected.

In `@agentwatch/cli/main.py`:
- Around line 1927-1939: The reported entry count in the success message at the
end of the command execution is inaccurate because `loaded =
len(engine._audit_log)` counts all audit entries, but when `include_allowed` is
False, the CSV generated by
`reporter.generate_csv(include_allowed=include_allowed)` filters to only denials
and overrides. Additionally, the code accesses the private `_audit_log`
attribute instead of using the public API. Calculate the actual count of entries
that were exported by counting the entries in the `csv_content` returned by
generate_csv (accounting for the header row), and use that count in the success
message console.print statement to accurately reflect what was written to the
file. Optionally, replace the direct access to `engine._audit_log` with the
public `engine.stats()` method if available.

In `@agentwatch/governance/compliance_reporter.py`:
- Around line 95-97: The generate_csv() method hard-codes a limit of 10_000 rows
in the self._governance.get_audit_log(limit=10_000) call, which silently
truncates audit records and produces incomplete compliance CSV exports without
user awareness. Remove the hard-coded limit parameter to retrieve all available
records, or if a limit is necessary for performance, implement pagination to
fetch all records in batches or add a clear warning message when results are
truncated so users know the export may be incomplete.

In `@agentwatch/governance/engine.py`:
- Around line 199-215: Valid JSON lines that are not objects (such as arrays or
primitives) will raise a TypeError when attempting to access fields using
bracket notation (e.g., data["audit_id"]), but this TypeError is not currently
caught by the exception handler. Add TypeError to the exception tuple in the
except clause alongside json.JSONDecodeError, KeyError, and ValueError so that
these corrupt entries are properly logged as warnings instead of causing the
load to crash unexpectedly.

---

Nitpick comments:
In `@tests/test_compliance.py`:
- Around line 248-315: Add a new test function that validates CSV export from
persisted JSONL audit history, not just in-memory entries. The test should
create a ComplianceReporter with a GovernanceEngine that has been initialized
from a persisted JSONL audit log file, then call generate_csv() and verify the
output contains the persisted audit entries with correct format and fields. This
locks the contract that CSV export accurately reflects the persisted audit
source data, which is the main concern for audit compliance tracking.
- Around line 274-315: The CSV assertions in these test functions are too
permissive and may not catch regressions effectively. In
test_compliance_report_to_csv_contains_denials, replace the broad string check
with CSV parsing and column-level validation to verify the denial appears in the
correct column. In test_compliance_reporter_generate_csv_denials_only, replace
the >= comparison with an exact row count assertion and check specific columns
(like column index 3 for event type) instead of searching for the string
anywhere in the row. In test_compliance_reporter_generate_csv_includes_allowed,
similarly replace len(rows) >= 3 with an exact equality check and validate that
specific columns contain the expected values rather than just checking if
"updated" exists in a loosely defined list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bcbb464b-a004-4db7-955d-4e0c7e57b3ac

📥 Commits

Reviewing files that changed from the base of the PR and between 1f4755c and 9f2bc99.

📒 Files selected for processing (5)
  • agentwatch/api/server.py
  • agentwatch/cli/main.py
  • agentwatch/governance/compliance_reporter.py
  • agentwatch/governance/engine.py
  • tests/test_compliance.py

Comment thread agentwatch/api/server.py
Comment on lines +1129 to +1133
async def compliance_audit_log_csv(
_auth: None = Depends(_require_api_key),
format: str = Query("csv", alias="format"),
include_allowed: bool = Query(False),
):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Remove or enforce format on /api/v1/compliance/audit-log.

format is accepted but ignored; the handler always returns CSV. This makes the API contract ambiguous.

🔧 Proposed fix (remove unused parameter)
 async def compliance_audit_log_csv(
     _auth: None = Depends(_require_api_key),
-    format: str = Query("csv", alias="format"),
     include_allowed: bool = Query(False),
 ):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def compliance_audit_log_csv(
_auth: None = Depends(_require_api_key),
format: str = Query("csv", alias="format"),
include_allowed: bool = Query(False),
):
async def compliance_audit_log_csv(
_auth: None = Depends(_require_api_key),
include_allowed: bool = Query(False),
):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agentwatch/api/server.py` around lines 1129 - 1133, The `format` parameter in
the compliance_audit_log_csv function is accepted as a Query parameter but is
never actually used within the function implementation, making the API contract
misleading. Remove the unused `format: str = Query("csv", alias="format")`
parameter from the function signature of compliance_audit_log_csv to clarify
that the endpoint only returns CSV format and eliminate the ambiguity about
whether the parameter is respected.

Comment thread agentwatch/cli/main.py Outdated
Comment on lines +1927 to +1939
engine = GovernanceEngine(audit_log_path=audit_log)
loaded = len(engine._audit_log)
if loaded == 0:
console.print(
f"[yellow]No audit entries found in {audit_log}. "
"The server persists audit data when AGENTWATCH_AUDIT_LOG_PATH is set.[/yellow]"
)
reporter = ComplianceReporter(engine)
csv_content = reporter.generate_csv(include_allowed=include_allowed)

with open(output, "w", encoding="utf-8", newline="") as f:
f.write(csv_content)
console.print(f"[green]{output} created successfully ({loaded} entries exported)[/green]")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reported export count is inaccurate and relies on a private attribute.

loaded = len(engine._audit_log) counts every loaded entry, but when include_allowed is False (the default) generate_csv filters down to denials/overrides only. The success message then reports loaded as "entries exported", overstating what was actually written — misleading in an audit context. Additionally, reading engine._audit_log reaches into a private attribute; engine.stats() exposes the same count publicly.

Compute the count from the actual CSV output and use the public API:

🐛 Proposed fix
     engine = GovernanceEngine(audit_log_path=audit_log)
-    loaded = len(engine._audit_log)
+    loaded = engine.stats()["total_audit_entries"]
     if loaded == 0:
         console.print(
             f"[yellow]No audit entries found in {audit_log}. "
             "The server persists audit data when AGENTWATCH_AUDIT_LOG_PATH is set.[/yellow]"
         )
     reporter = ComplianceReporter(engine)
     csv_content = reporter.generate_csv(include_allowed=include_allowed)
+    exported = max(csv_content.count("\n") - 1, 0)  # rows excluding header

     with open(output, "w", encoding="utf-8", newline="") as f:
         f.write(csv_content)
-    console.print(f"[green]{output} created successfully ({loaded} entries exported)[/green]")
+    console.print(f"[green]{output} created successfully ({exported} entries exported)[/green]")
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 1936-1936: File path is request-/variable-derived; validate and normalize to prevent path traversal.
Context: open(output, "w", encoding="utf-8", newline="")
Note: [CWE-22] Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal').

(open-filename-from-request)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agentwatch/cli/main.py` around lines 1927 - 1939, The reported entry count in
the success message at the end of the command execution is inaccurate because
`loaded = len(engine._audit_log)` counts all audit entries, but when
`include_allowed` is False, the CSV generated by
`reporter.generate_csv(include_allowed=include_allowed)` filters to only denials
and overrides. Additionally, the code accesses the private `_audit_log`
attribute instead of using the public API. Calculate the actual count of entries
that were exported by counting the entries in the `csv_content` returned by
generate_csv (accounting for the header row), and use that count in the success
message console.print statement to accurately reflect what was written to the
file. Optionally, replace the direct access to `engine._audit_log` with the
public `engine.stats()` method if available.

Comment on lines +95 to +97
audit_log = self._governance.get_audit_log(limit=10_000)
if not include_allowed:
audit_log = [entry for entry in audit_log if not entry.allowed]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Avoid silent truncation of compliance CSV exports.

generate_csv() hard-caps audit retrieval at 10,000 rows, which can silently drop persisted records and produce incomplete compliance artifacts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agentwatch/governance/compliance_reporter.py` around lines 95 - 97, The
generate_csv() method hard-codes a limit of 10_000 rows in the
self._governance.get_audit_log(limit=10_000) call, which silently truncates
audit records and produces incomplete compliance CSV exports without user
awareness. Remove the hard-coded limit parameter to retrieve all available
records, or if a limit is necessary for performance, implement pagination to
fetch all records in batches or add a clear warning message when results are
truncated so users know the export may be incomplete.

Comment on lines +199 to +215
try:
data = json.loads(line)
entry = AuditEntry(
audit_id=data["audit_id"],
timestamp=datetime.fromisoformat(data["timestamp"]),
principal_id=data.get("principal_id"),
event_type=AuditEventType(data["event_type"]),
resource=data["resource"],
action=data["action"],
allowed=data["allowed"],
session_id=data.get("session_id"),
details=data.get("details", {}),
)
self._audit_log.append(entry)
count += 1
except (json.JSONDecodeError, KeyError, ValueError) as exc:
logger.warning("Skipping corrupt audit entry: %s", exc)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle non-object JSONL entries as corrupt rows instead of crashing load.

Valid JSON lines that are not objects can raise TypeError during field extraction and currently bypass the inner handler, which can abort startup loading unexpectedly.

🔧 Proposed fix
                 for line in f:
                     line = line.strip()
                     if not line:
                         continue
                     try:
                         data = json.loads(line)
+                        if not isinstance(data, dict):
+                            raise ValueError("audit entry must be a JSON object")
                         entry = AuditEntry(
                             audit_id=data["audit_id"],
                             timestamp=datetime.fromisoformat(data["timestamp"]),
                             principal_id=data.get("principal_id"),
                             event_type=AuditEventType(data["event_type"]),
                             resource=data["resource"],
                             action=data["action"],
                             allowed=data["allowed"],
                             session_id=data.get("session_id"),
                             details=data.get("details", {}),
                         )
                         self._audit_log.append(entry)
                         count += 1
-                    except (json.JSONDecodeError, KeyError, ValueError) as exc:
+                    except (json.JSONDecodeError, KeyError, TypeError, ValueError) as exc:
                         logger.warning("Skipping corrupt audit entry: %s", exc)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agentwatch/governance/engine.py` around lines 199 - 215, Valid JSON lines
that are not objects (such as arrays or primitives) will raise a TypeError when
attempting to access fields using bracket notation (e.g., data["audit_id"]), but
this TypeError is not currently caught by the exception handler. Add TypeError
to the exception tuple in the except clause alongside json.JSONDecodeError,
KeyError, and ValueError so that these corrupt entries are properly logged as
warnings instead of causing the load to crash unexpectedly.

@arcgod-design

Copy link
Copy Markdown
Contributor Author

@arcgod-design Thanks for the contribution.

Before merge, I'd like some clarification around the export source.

The CSV export functionality itself looks useful, but I want to ensure the exported report reflects actual compliance/audit findings rather than data generated from a fresh local compliance engine at export time.

Please clarify:

  • Where the exported records originate from
  • Whether the CLI exports persisted compliance history or generates a new report on demand
  • How the local export path maps to real audit findings

Also, let's wait for the remaining CI checks to complete before making a merge decision.

Once that's clarified and CI is green, I'll take another look.

Where the exported records originate from: The CSV export reads directly from the persisted JSONL audit log file on disk. When the server runs with AGENTWATCH_AUDIT_LOG_PATH set, every compliance decision (denials, overrides, approvals) is appended to that JSONL file in real-time. The GovernanceEngine loads this file into memory on init, and the ComplianceReporter.generate_csv() iterates over those loaded entries to produce the CSV.

CLI vs on-demand: Both paths export the same persisted history — there's no synthetic data generation. The CLI export-local command (agentwatch compliance export-local --audit-log ) reads from the same JSONL file the server writes to. The API endpoint /api/v1/compliance/audit-log does the same thing but serves it as an HTTP response. Neither path creates new audit entries — they only serialize what's already been recorded.

How local export maps to real audit findings: The JSONL file is the single source of truth. Each line is a JSON object with audit_id, timestamp, principal_id, event_type, resource, action, allowed, and details. The CSV is a 1:1 row-per-entry mapping of this data. When --include-allowed is false (default), only denials and overrides are exported — the same entries that would appear in a compliance review.

The key invariant: if it's not in the JSONL file, it won't be in the CSV. The export is purely a format conversion of persisted audit records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat] support exporting compliance audit logs as CSV

2 participants