Summary
We adapted your claim-driven testing methodology to PSA v3 — a multi-agent behavioral monitoring system built by Silicon Psyche Labs. We wanted to share the outcome and a few observations from applying the framework to a non-database domain.
What we built
A full tutorial (Tutorial 14 in our public SDK repo) covering:
- Claim formalization for a behavioral monitoring API (claims like determinism, session consistency, propagation correctness, concurrent isolation, score immutability)
- 9-state verdict taxonomy adapted for HTTP API testing
- Five concrete test scenarios with property oracles, not just key-presence checks
- Coverage adequacy argument table
The tutorial is published here: https://github.com/SiliconPsycheLabs/PSA-core/blob/main/tutorials/14-testing-agentic-systems.md
Key observations
What translated directly:
- The claim-formalization step (C1, C2, ...) is the single highest-value activity — it surfaced 6 implicit guarantees our API was making without any test coverage.
- The oracle discipline critique ("PARTIAL-surface vs PASS-hardening") forced us to upgrade from
assert key in response to assert response["scs"] > 0.5 — a qualitative jump in test value.
- The 9-state verdict taxonomy replaced our binary PASS/FAIL with diagnostically precise output.
INCONCLUSIVE-oracle-too-weak alone identified 4 tests in our existing suite that were effectively no-ops.
- The fault injection categories (process fault → restart, concurrency fault → parallel POSTs, staleness fault → time-separated reads) mapped well even without infrastructure-level access (no iptables, single-node deployment).
What required adaptation:
- Linearizability/Elle does not apply (no register operations, no history of conflicting reads/writes). We replaced it with determinism oracles (same input → same output) and immutability oracles (scores do not change post-inference).
- Network partition testing is not feasible without topology control. We substituted concurrency faults (N parallel writes) and staleness faults (immediate GET after POST).
- The
§7.M model/history/checker discipline is overkill for single-endpoint scoring APIs, but the spirit of it — requiring a named abstract model, a history schema, and a machine-checkable checker — was extremely useful for formalizing the adversarial propagation oracle.
References
The methodology is documented in your SKILL.md files. The oracle-patterns and verdict-taxonomy reference files were the most directly useful. The fault-injection-howto framing ("faults must fire, produce evidence, and be reversible") guided our durability-after-restart scenario design.
Happy to share more details or discuss how the approach works on non-database stateful systems. Thanks for publishing this — it filled a real gap.
— Silicon Psyche Labs
Summary
We adapted your claim-driven testing methodology to PSA v3 — a multi-agent behavioral monitoring system built by Silicon Psyche Labs. We wanted to share the outcome and a few observations from applying the framework to a non-database domain.
What we built
A full tutorial (Tutorial 14 in our public SDK repo) covering:
The tutorial is published here: https://github.com/SiliconPsycheLabs/PSA-core/blob/main/tutorials/14-testing-agentic-systems.md
Key observations
What translated directly:
assert key in responsetoassert response["scs"] > 0.5— a qualitative jump in test value.INCONCLUSIVE-oracle-too-weakalone identified 4 tests in our existing suite that were effectively no-ops.What required adaptation:
§7.Mmodel/history/checker discipline is overkill for single-endpoint scoring APIs, but the spirit of it — requiring a named abstract model, a history schema, and a machine-checkable checker — was extremely useful for formalizing the adversarial propagation oracle.References
The methodology is documented in your SKILL.md files. The oracle-patterns and verdict-taxonomy reference files were the most directly useful. The fault-injection-howto framing ("faults must fire, produce evidence, and be reversible") guided our durability-after-restart scenario design.
Happy to share more details or discuss how the approach works on non-database stateful systems. Thanks for publishing this — it filled a real gap.
— Silicon Psyche Labs