-
Notifications
You must be signed in to change notification settings - Fork 3.9k
CASSANDRA-21398: Add agent skills for property / stateful testing #4837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dcapwell
wants to merge
13
commits into
apache:trunk
Choose a base branch
from
dcapwell:CASSANDRA-21398
base: trunk
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
2ba7365
working on skills for testing
dcapwell 69f4733
added meta skill to bias to property/stateful
dcapwell 9e8da5c
added tests created by the skills
dcapwell 38b302d
doc accord gens
dcapwell 13fc57b
feedback
dcapwell 974ef42
fixed bug with Hex.parseLong so it properly handles hex input
dcapwell 5e50fb2
fixed non-deterministic dynamic list
dcapwell 17a8298
feedback
dcapwell 577eeed
remove comment
dcapwell c35bf55
cleanup
dcapwell 51eb281
added tests for table id logic
dcapwell 7747d7e
cleanup
dcapwell 1adb953
ref table walk
dcapwell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,262 @@ | ||
| --- | ||
| name: cassandra-testing-stateful | ||
| description: Write stateful property-based tests for Apache Cassandra using the Property.stateful() framework. Use when verifying systems behave correctly through sequences of operations, modeling interactions with stateful systems, or testing CRUD/state-machine correctness with random command sequences. | ||
| --- | ||
|
|
||
| # Stateful Property-Based Testing with `stateful()` | ||
|
|
||
| Write stateful property-based tests for Apache Cassandra using the `Property.stateful()` framework. This is the "CRUD testing" or "model-based testing" approach where random command sequences exercise state machines and verify invariants hold throughout. | ||
|
|
||
| ## When to Use | ||
|
|
||
| - Testing stateful data structures (trees, indexes, caches) against a simple model | ||
| - Verifying CRUD operations maintain consistency | ||
| - Testing multi-step sequences (topology changes, schema mutations, journal operations) | ||
| - Testing systems that survive restart/flush/compact cycles | ||
| - Any scenario where you need to verify a sequence of operations, not just individual properties | ||
|
|
||
| ## Key Concepts | ||
|
|
||
| ### Entry Point: `stateful()` | ||
|
|
||
| ```java | ||
| import static accord.utils.Property.stateful; | ||
| import static accord.utils.Property.commands; | ||
| ``` | ||
|
|
||
| Returns a `StatefulBuilder` for configuring and running stateful property tests. | ||
|
|
||
| ### Configuration | ||
|
|
||
| Inherits from `qt()` plus: | ||
|
|
||
| | Method | Purpose | Default | | ||
| |--------|---------|---------| | ||
| | `withExamples(int)` | Number of full test runs | 500 | | ||
| | `withSteps(int)` | Max commands per test run | 1000 | | ||
| | `withStepTimeout(Duration)` | Timeout per individual step | none | | ||
| | `withSeed(long)` | Pin seed for reproducibility | random | | ||
| | `withPure(boolean)` | Fresh seed per example | true | | ||
| | `withTimeout(Duration)` | Timeout for entire example | none | | ||
|
|
||
| ### Core Types | ||
|
|
||
| - **`State`** - Model/tracking state for the test (e.g., a `TreeMap` as oracle) | ||
| - **`SystemUnderTest`** (SUT) - The actual implementation being tested (optional - often `Void` when State acts as both) | ||
| - **`Command<State, SystemUnderTest, Result>`** - A single operation applied to both model and SUT | ||
|
|
||
| ### Command Interface | ||
|
|
||
| ```java | ||
| interface Command<State, SystemUnderTest, Result> { | ||
| // Guard: should this command execute given current state? | ||
| default PreCheckResult checkPreconditions(State state) { return PreCheckResult.Ok; } | ||
| // Apply to the model - return expected result | ||
| Result apply(State state) throws Throwable; | ||
| // Apply to the real system - return actual result | ||
| Result run(SystemUnderTest sut) throws Throwable; | ||
| // Verify model and SUT agree | ||
| default void checkPostconditions(State state, Result expected, | ||
| SystemUnderTest sut, Result actual) throws Throwable {} | ||
| // Human-readable description for history logging | ||
| default String detailed(State state) { return this.toString(); } | ||
| } | ||
| ``` | ||
|
|
||
| Convenience interfaces: | ||
| - **`UnitCommand<State, SUT>`** - When apply/run return void (use `applyUnit`/`runUnit`) | ||
| - **`StateOnlyCommand<State>`** - When there's no separate SUT (extends `UnitCommand<State, Void>`) | ||
| - **`SimpleCommand<State>`** - Inline lambda for `StateOnlyCommand` with a name | ||
|
|
||
| ### Commands Builder | ||
|
|
||
| ```java | ||
| commands(() -> stateGen) // State-only (no SUT) | ||
| commands(() -> stateGen, Sut::new) // State + separate SUT | ||
| ``` | ||
|
|
||
| Builder methods: | ||
| - `.add(cmd)` / `.add(gen)` / `.add((rs, state) -> cmd)` - Add command with random weight | ||
| - `.add(weight, cmd)` - Add command with fixed weight | ||
| - `.addIf(predicate, cmd)` - Conditional command (only when predicate is true) | ||
| - `.addAllIf(predicate, builder -> {...})` - Multiple conditional commands | ||
| - `.preCommands(state -> {...})` - Run before each step | ||
| - `.destroyState((state, cause) -> {...})` - Cleanup state after each example | ||
| - `.destroySut((sut, cause) -> {...})` - Cleanup SUT after each example | ||
| - `.onSuccess((state, sut, history) -> {...})` - Callback on successful example | ||
| - `.onFailure((state, sut, history, cause) -> {...})` - Callback on failure | ||
| - `.commandsTransformer((state, gen) -> newGen)` - Transform command generator per state | ||
| - `.build()` - Produce the `Commands` object | ||
|
|
||
| ## Patterns | ||
|
|
||
| ### Pattern 1: State-only with SimpleCommand (most common, simplest) | ||
|
|
||
| When the State IS the system under test (model + SUT combined). | ||
|
|
||
| ```java | ||
| import static accord.utils.Property.stateful; | ||
| import static accord.utils.Property.commands; | ||
|
|
||
| stateful().check(commands(() -> State::new) | ||
| .add(MyTest::insertCommand) | ||
| .add(MyTest::readCommand) | ||
| .addIf(s -> !s.isEmpty(), MyTest::deleteCommand) | ||
| .build()); | ||
|
|
||
| // Command factory method | ||
| private static Property.Command<State, Void, ?> insertCommand(RandomSource rs, State state) { | ||
| int key = rs.nextInt(); | ||
| int value = rs.nextInt(); | ||
| return new Property.SimpleCommand<>( | ||
| "Insert(" + key + ", " + value + ")", | ||
| s -> { | ||
| s.model.put(key, value); | ||
| s.sut.put(key, value); | ||
| assertThat(s.sut.get(key)).isEqualTo(value); | ||
| }); | ||
| } | ||
| ``` | ||
|
|
||
| ### Pattern 2: Separate State and SUT with full Command interface | ||
|
|
||
| When State (model) and SUT (real implementation) are different objects. | ||
|
|
||
| ```java | ||
| stateful().check(commands(() -> State::new, state -> new Sut(state)) | ||
| .add((rs, state) -> new Create(nextRange(rs), rs.nextInt())) | ||
| .add((rs, state) -> new Read(nextRange(rs))) | ||
| .addAllIf(state -> !state.isEmpty(), b -> b | ||
| .add((rs, state) -> new Update(rs.pickOrderedSet(state.keys()), rs.nextInt())) | ||
| .add((rs, state) -> new Delete(rs.pickOrderedSet(state.keys())))) | ||
| .destroyState(State::close) | ||
| .destroySut(Sut::close) | ||
| .build()); | ||
|
|
||
| // Full Command with apply/run/checkPostconditions | ||
| static class Read implements Command<State, Sut, List<Integer>> { | ||
| private final Range range; | ||
| Read(Range range) { this.range = range; } | ||
|
|
||
| @Override public List<Integer> apply(State state) { return state.search(range); } | ||
| @Override public List<Integer> run(Sut sut) { return sut.tree.search(range); } | ||
| @Override public void checkPostconditions(State state, List<Integer> expected, | ||
| Sut sut, List<Integer> actual) { | ||
| Assertions.assertThat(actual).isEqualTo(expected); | ||
| } | ||
| @Override public String detailed(State state) { return "Read(" + range + ")"; } | ||
| } | ||
| ``` | ||
|
|
||
| ### Pattern 3: Multistep commands | ||
|
|
||
| Group multiple commands as an atomic sequence in history. | ||
|
|
||
| ```java | ||
| import static accord.utils.Property.multistep; | ||
|
|
||
| Gen<Command<State, Void, ?>> topologyCommand = rs -> multistep( | ||
| new SimpleCommand<>("Stop Node", s -> s.stopNode(node)), | ||
| new SimpleCommand<>("Replace Host", s -> s.replaceHost(node)), | ||
| new SimpleCommand<>("Reconfigure CMS", s -> s.reconfigureCMS()) | ||
| ); | ||
| ``` | ||
|
|
||
| ### Pattern 4: Using destroyState for final validation | ||
|
|
||
| ```java | ||
| stateful().withExamples(50).withSteps(500).check(commands(() -> State::new) | ||
| .add(MyTest::addTable) | ||
| .addIf(s -> !s.tables.isEmpty(), MyTest::dropTable) | ||
| .destroyState(state -> { | ||
| // Run after all steps - finish pending work and validate final state | ||
| state.finishPendingSequences(); | ||
| state.validateFinalSchema(); | ||
| }) | ||
| .build()); | ||
| ``` | ||
|
|
||
| ## Error Reporting | ||
|
|
||
| On failure, `PropertyError` reports: | ||
| - Seed for reproducibility | ||
| - Number of examples / steps configured | ||
| - The specific failing step number | ||
| - State at time of failure (via `toString()`) | ||
| - Full command history (numbered, with `detailed()` strings) | ||
| - If `withStepTimeout` is used, duration per step is appended to history | ||
|
|
||
| ## State Design Guidelines | ||
|
|
||
| 1. **Merged State+SUT** (most common): State holds both the model (e.g., `TreeMap`) and the SUT (e.g., `RangeTree`). Commands operate on both within `applyUnit`. | ||
|
|
||
| 2. **Separate State and SUT**: Use when the SUT has complex lifecycle (journal, cluster). State owns the model, SUT wraps the real thing. `destroyState`/`destroySut` handle cleanup. | ||
|
|
||
| 3. **State should implement useful `toString()`**: It appears in error reports. | ||
|
|
||
| 4. **Commands should have descriptive `detailed(State)`**: These form the history trace shown on failure. | ||
|
|
||
| ## Imports | ||
|
|
||
| ```java | ||
| import static accord.utils.Property.stateful; | ||
| import static accord.utils.Property.commands; | ||
| import static accord.utils.Property.multistep; | ||
| import static accord.utils.Property.ignoreCommand; | ||
| import accord.utils.Property.Command; | ||
| import accord.utils.Property.UnitCommand; | ||
| import accord.utils.Property.StateOnlyCommand; | ||
| import accord.utils.Property.SimpleCommand; | ||
| import accord.utils.Property.PreCheckResult; | ||
| import accord.utils.Gens; | ||
| import accord.utils.Gen; | ||
| import accord.utils.RandomSource; | ||
| ``` | ||
|
|
||
| ## Framework Location | ||
|
|
||
| - `modules/accord/accord-core/src/test/java/accord/utils/Property.java` - Core framework | ||
| - `modules/accord/accord-core/src/test/java/accord/utils/Gens.java` - Generator utilities | ||
| - `modules/accord/accord-core/src/test/java/accord/utils/README.md` - Documentation | ||
|
|
||
| ## Example: SingleNodeTableWalkTest (complex real-world stateful test) | ||
|
|
||
| `test/distributed/org/apache/cassandra/distributed/test/cql3/SingleNodeTableWalkTest.java` is a complex stateful test that exercises CQL read/write paths against random table schemas with random data. It is a good reference for writing non-trivial stateful tests. | ||
|
|
||
| Key design patterns demonstrated: | ||
| - **State with overridable hooks**: The `State` inner class defines boolean predicates (`supportTokens()`, `allowNonPartitionQuery()`, `allowPartitionQuery()`, etc.) that control which commands are eligible via `addIf`. Subclasses override these hooks to alter behavior without duplicating the test structure. | ||
| - **Overridable factory methods**: `createState()`, `createCluster()`, `defineTable()`, `supportedTypes()`, `supportedPrimaryColumnTypes()`, `supportedIndexers()`, and `preCheck()` are all `protected` methods that subclasses override to customize schema generation, cluster topology, and test configuration. | ||
| - **Extensive use of `addIf`/`addAllIf`**: Commands are conditionally included based on current state (e.g., only select existing rows when partitions exist, only compact when there are enough SSTables). | ||
|
|
||
| The test has a subclass hierarchy that reuses the same stateful structure for different configurations: | ||
| - `SingleNodeTableWalkTest` -- single-node, base test | ||
| - `MultiNodeTableWalkBase` -- multi-node (overrides `createCluster()`, adjusts consistency) | ||
| - `MultiNodeTableWalkWithReadRepairTest` / `MultiNodeTableWalkWithoutReadRepairTest` | ||
| - `CasMultiNodeTableWalkBase` -- CAS (Paxos) transactions | ||
| - `AccordInteropMultiNodeTableWalkBase` -- Accord transaction interop | ||
| - `FullAccordInteropMultiNodeTableWalkTest`, `MixedReadsAccordInteropMultiNodeTableWalkTest` | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - Default examples for `stateful()` is **500** (not 1000 like `qt()`) | ||
| - Default steps per example is **1000** | ||
| - Commands with `addIf` are only included when the predicate is true for the current state | ||
| - **Prefer `addIf` over `ignoreCommand()`**: Use `addIf(predicate, cmd)` to conditionally include commands rather than having command factories return `Property.ignoreCommand()`. The framework fails the test if too many consecutive commands are ignored, making `ignoreCommand()` flaky-prone. `addIf` avoids this by excluding the command from selection entirely when the predicate is false. | ||
| - `onSuccess` and `onFailure` callbacks are useful for logging history on completion | ||
| - When writing `detailed()`, include the command parameters (key, range, etc.) for debuggability | ||
|
|
||
| ## Deterministic Execution Requirement | ||
|
|
||
| Stateful tests are especially vulnerable to non-deterministic SUTs because failures depend on the exact sequence of states traversed. If the SUT uses internal randomness (e.g., `ThreadLocalRandom`, `Math.random()`), replaying a seed reproduces the same command sequence but the SUT may take a different internal path, making the failure non-reproducible. | ||
|
|
||
| **Before writing a stateful test, audit the SUT for internal randomness.** See the `cassandra-testing-property` skill for a full table of common randomness sources and mitigations. | ||
|
|
||
| **Key principle**: Every aspect of the test that affects the outcome must be derived from `RandomSource`. This includes: | ||
| - Values generated for commands (covered by the command factory receiving `RandomSource rs`) | ||
| - The SUT's internal behavior (NOT covered if the SUT uses `ThreadLocalRandom` etc.) | ||
| - Any initialization parameters (covered if `State(RandomSource rs)` constructor is used) | ||
| - **Iteration order of collections in the SUT** -- if the SUT iterates a `HashMap`/`HashSet` and the order affects behavior, the test is non-reproducible across machines/JVM versions even though no random API is called | ||
|
|
||
| Stateful tests are particularly sensitive to iteration order because a different traversal order can change which command's precondition is met, which element gets selected, or which state transition fires -- producing a completely different state sequence from the same seed. For example, if a command iterates `HashMap.entrySet()` to find the first element matching a condition, a different iteration order means a different element is found, leading to divergent state. See the `cassandra-testing-property` skill for a full list of common patterns that break reproducibility. | ||
|
|
||
| If the SUT cannot be made deterministic, document the limitation prominently in the test class Javadoc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| --- | ||
| name: cassandra-testing | ||
| description: Guide for writing tests in Apache Cassandra. Use whenever writing new tests, adding test coverage, or deciding what testing approach to use for a change. Prioritizes property-based and stateful property testing over hand-written example-based tests. | ||
| --- | ||
|
|
||
| # Cassandra Testing Strategy | ||
|
|
||
| When writing tests for Apache Cassandra, prefer property-based testing over hand-written example-based tests. Property tests generate thousands of random inputs and catch edge cases that humans miss. | ||
|
|
||
| ## Decision Flow | ||
|
|
||
| Ask these questions about what you're testing: | ||
|
|
||
| ### 1. Does it involve sequences of operations on mutable state? | ||
|
|
||
| Examples: data structures (trees, indexes, caches), CRUD workflows, schema mutations, topology changes, journal lifecycle (flush/compact/restart). | ||
|
|
||
| **Use the `cassandra-testing-stateful` skill** - it provides `Property.stateful()`, which gives you: | ||
| - Random command sequences with weighted selection | ||
| - Conditional commands (only delete when non-empty) | ||
| - Model-based verification (compare SUT against a simple oracle) | ||
| - Full command history in failure reports for debugging | ||
| - Lifecycle hooks (destroyState, destroySut, onSuccess) | ||
|
|
||
| ### 2. Does it validate a property/invariant? | ||
|
|
||
| Examples: serialization round-trips, algebraic laws (idempotency, commutativity), encoding correctness, range/boundary behavior, type validation. | ||
|
|
||
| **Use the `cassandra-testing-property` skill** - it provides `Property.qt()` with typed generators, which gives you: | ||
| - `forAll(gen).check(value -> ...)` for clean property assertions | ||
| - `Serializers.testSerde()` for automated serialization testing | ||
| - Rich generator library (Accord `Gens`, Cassandra `Generators`, `AbstractTypeGenerators`, `CassandraGenerators`) | ||
| - Meta-randomness (`mixedDistribution`) for bias-aware testing | ||
|
|
||
| ### 3. Even "simple" tests benefit from property testing | ||
|
|
||
| A test that looks simple often hides complexity. A straightforward `INSERT INTO ... SELECT * FROM` test seems trivial with `(pk int, v int)`, but breaks with `(pk varint, ck frozen<udt>, v int, PRIMARY KEY(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC)`. The schema, types, clustering order, and value expressions all interact in ways no human enumerates completely. | ||
|
|
||
| Rather than writing a single example and hoping it covers enough, generate the schema and values randomly: | ||
|
|
||
| ``` | ||
| // pseudocode | ||
| qt().check(rs -> { | ||
| schema = randomSchema(rs) // random types, clustering order, indexes, etc. | ||
| row = randomRow(rs, schema) // random values matching the schema | ||
| insert(schema, row) | ||
| result = select(schema, row.pk) | ||
| assertEqual(result, row) | ||
| }); | ||
| ``` | ||
|
|
||
| This approach finds edge cases across type combinations, encoding paths, and query planning that hand-written examples miss entirely. | ||
|
|
||
| ## Why Property Tests Over Example Tests | ||
|
|
||
| - A single `qt().forAll(gen).check(...)` replaces dozens of hand-written test cases | ||
| - Random generation finds edge cases humans don't think of (boundary values, empty inputs, overflow, type interactions) | ||
| - Meta-randomness varies the distribution per example, catching bugs that require specific patterns (e.g., mostly-deletes followed by a read) | ||
| - Failures are reproducible via seed - just add `withOnlySeed(seed)` to replay | ||
| - The generator libraries already exist for most Cassandra types - you just compose them |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hex includes upper case but this method doesn't handle that. This is fine in existing usage as we use
Long.toHexString(lsb);which explicitly documents that its lower case only (if you want upper case it says to dotoUpperCase).So this bug fix isn't for any existing logic, but more defensive to make sure future logic works correctly