Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions SDD-Keylime-Monitoring-Tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -1066,15 +1066,16 @@ The attestation timeline (FR-024) distributes event counts across hourly buckets

The frontend derives attestation KPIs from agent state data when no attestation history endpoint is available (FR-001):

| KPI | Computation |
|-----|------------|
| Total Agents | `paginated_response.total_items` or `agents.length` |
| Failed Attestations | Count of agents in `failed`, `invalid_quote`, `tenant_failed` (pull) or `fail`, `timeout` (push) state |
| Success Rate | `((total - failed) / total) * 100` |
| Urgent Alerts | From `GET /api/alerts/summary` -> `critical + warnings` count; subtitle shows per-severity breakdown (e.g., "2 critical, 2 warnings") |
| Alert Center: Critical | From `GET /api/alerts/summary` -> `critical` (all states) |
| Alert Center: Warnings | From `GET /api/alerts/summary` -> `warnings` (all states) |
| Alert Center: Info | From `GET /api/alerts/summary` -> `info` (all states) |
| KPI | Computation | Color |
|-----|------------|-------|
| Total Agents | `paginated_response.total_items` or `agents.length` | — |
| Failed Attestations | Count of agents in `failed`, `invalid_quote`, `tenant_failed` (pull) or `fail` (push) state | Red |
| Timed-Out Attestations | Count of agents in `timeout` (push) state | Orange |
| Success Rate | `((total - failed - timed_out) / total) * 100` | — |
| Urgent Alerts | From `GET /api/alerts/summary` -> `critical + warnings` count; subtitle shows per-severity breakdown (e.g., "2 critical, 2 warnings") | — |
| Alert Center: Critical | From `GET /api/alerts/summary` -> `critical` (all states) | — |
| Alert Center: Warnings | From `GET /api/alerts/summary` -> `warnings` (all states) | — |
| Alert Center: Info | From `GET /api/alerts/summary` -> `info` (all states) | — |

**Rationale:** Ensures the dashboard displays meaningful data before TimescaleDB attestation history persistence is implemented.

Expand Down
26 changes: 22 additions & 4 deletions SRS-Keylime-Monitoring-Tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ The System transforms Keylime from a CLI-driven security tool into a visual oper

### FR-001: Fleet Overview KPI Dashboard

**Description:** The System MUST display a fleet overview dashboard presenting computed KPIs derived from the Keylime Verifier and Registrar APIs. The dashboard MUST show: Total Active Agents, Failed Agents (states 7, 9, 10), Attestation Success Rate, Average Attestation Latency, Certificate Expiry Warnings, Active IMA Policies, Revocation Events (24h), Consecutive Failures per agent, and Registration Count.
**Description:** The System MUST display a fleet overview dashboard presenting computed KPIs derived from the Keylime Verifier and Registrar APIs. The dashboard MUST show: Total Active Agents, Failed Agents (states 7, 9, 10), Attestation Success Rate, Average Attestation Latency, Certificate Expiry Warnings, Active IMA Policies, Revocation Events (24h), Consecutive Failures per agent, and Registration Count. The Attestation Success Rate visualization MUST distinguish Timeout agents (displayed in orange) from Failed agents (displayed in red) to differentiate agents that stopped responding from agents that explicitly failed attestation.

**Trace:** Dashboard - Key Performance Indicators; Dashboard - Main Screen Layout

Expand All @@ -215,6 +215,16 @@ Feature: Fleet Overview KPI Dashboard
Then the dashboard MUST display cached KPI data with a staleness indicator
And a banner MUST warn "Verifier API unreachable — data may be stale"

Scenario: Attestation Success Rate distinguishes Timeout from Fail
Given the fleet contains 240 agents in PASS or GET_QUOTE state
And 5 agents are in FAILED or FAIL state
And 2 push-mode agents are in TIMEOUT (103) state
When the user views the Attestation Success Rate on the Fleet Overview Dashboard
Then the rate MUST be computed as ((247 - 5 - 2) / 247) * 100
And failed agents MUST be rendered in red
And timed-out agents MUST be rendered in orange
And the visualization MUST allow the user to distinguish Timeout from Fail at a glance

Scenario: Failed agent threshold alert
Given the alert threshold for Failed Agents is configured to "any count > 0"
When 1 or more agents enter state 7 (FAILED), 9 (INVALID_QUOTE), or 10 (TENANT_FAILED)
Expand Down Expand Up @@ -892,19 +902,27 @@ Feature: Cross-Tab Navigation

### FR-024: Attestation Analytics Overview

**Description:** The System MUST provide an attestation analytics overview displaying: total successful attestations, total failed attestations, average latency, and success rate as summary KPIs; an hourly attestation volume bar chart; a failure reason breakdown (donut chart); a latency distribution histogram; and a top failing agents ranked list.
**Description:** The System MUST provide an attestation analytics overview displaying: total successful attestations, total failed attestations, total timed-out attestations, average latency, and success rate as summary KPIs; an hourly attestation volume bar chart; a failure reason breakdown (donut chart); a latency distribution histogram; and a top failing agents ranked list. The hourly attestation volume bar chart MUST render three visually distinct categories: successful (green), failed (red), and timed-out (orange). Timed-out attestations represent push-mode agents that stopped submitting attestations (TIMEOUT state 103), as distinguished from agents that explicitly failed attestation (FAIL state 101).

**Trace:** Attestation Analytics - Overview Dashboard

```gherkin
Feature: Attestation Analytics Overview

Scenario: Display attestation summary KPIs
Given there were 12,450 successful and 38 failed attestations in the last 24 hours
Given there were 12,450 successful, 33 failed, and 5 timed-out attestations in the last 24 hours
When the user navigates to the Attestation Analytics view
Then the summary MUST show "12,450" successful, "38" failed, and the computed average latency
Then the summary MUST show "12,450" successful, "33" failed, "5" timed-out, and the computed average latency
And the success rate MUST display as "99.7%"

Scenario: Hourly bar chart distinguishes Timeout from Fail
Given the hourly attestation volume includes successful, failed, and timed-out attestations
When the user views the hourly attestation volume bar chart
Then successful attestations MUST be rendered in green
And failed attestations MUST be rendered in red
And timed-out attestations MUST be rendered in orange
And the chart legend MUST display all three categories

Scenario: Display top failing agents
Given multiple agents have different failure counts
When the user views the Attestation Analytics
Expand Down
Loading