Skip to content
448 changes: 448 additions & 0 deletions .agents/skills/cassandra-testing-property/SKILL.md

Large diffs are not rendered by default.

262 changes: 262 additions & 0 deletions .agents/skills/cassandra-testing-stateful/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
---
name: cassandra-testing-stateful
description: Write stateful property-based tests for Apache Cassandra using the Property.stateful() framework. Use when verifying systems behave correctly through sequences of operations, modeling interactions with stateful systems, or testing CRUD/state-machine correctness with random command sequences.
---

# Stateful Property-Based Testing with `stateful()`

Write stateful property-based tests for Apache Cassandra using the `Property.stateful()` framework. This is the "CRUD testing" or "model-based testing" approach where random command sequences exercise state machines and verify invariants hold throughout.

## When to Use

- Testing stateful data structures (trees, indexes, caches) against a simple model
- Verifying CRUD operations maintain consistency
- Testing multi-step sequences (topology changes, schema mutations, journal operations)
- Testing systems that survive restart/flush/compact cycles
- Any scenario where you need to verify a sequence of operations, not just individual properties

## Key Concepts

### Entry Point: `stateful()`

```java
import static accord.utils.Property.stateful;
import static accord.utils.Property.commands;
```

Returns a `StatefulBuilder` for configuring and running stateful property tests.

### Configuration

Inherits from `qt()` plus:

| Method | Purpose | Default |
|--------|---------|---------|
| `withExamples(int)` | Number of full test runs | 500 |
| `withSteps(int)` | Max commands per test run | 1000 |
| `withStepTimeout(Duration)` | Timeout per individual step | none |
| `withSeed(long)` | Pin seed for reproducibility | random |
| `withPure(boolean)` | Fresh seed per example | true |
| `withTimeout(Duration)` | Timeout for entire example | none |

### Core Types

- **`State`** - Model/tracking state for the test (e.g., a `TreeMap` as oracle)
- **`SystemUnderTest`** (SUT) - The actual implementation being tested (optional - often `Void` when State acts as both)
- **`Command<State, SystemUnderTest, Result>`** - A single operation applied to both model and SUT

### Command Interface

```java
interface Command<State, SystemUnderTest, Result> {
// Guard: should this command execute given current state?
default PreCheckResult checkPreconditions(State state) { return PreCheckResult.Ok; }
// Apply to the model - return expected result
Result apply(State state) throws Throwable;
// Apply to the real system - return actual result
Result run(SystemUnderTest sut) throws Throwable;
// Verify model and SUT agree
default void checkPostconditions(State state, Result expected,
SystemUnderTest sut, Result actual) throws Throwable {}
// Human-readable description for history logging
default String detailed(State state) { return this.toString(); }
}
```

Convenience interfaces:
- **`UnitCommand<State, SUT>`** - When apply/run return void (use `applyUnit`/`runUnit`)
- **`StateOnlyCommand<State>`** - When there's no separate SUT (extends `UnitCommand<State, Void>`)
- **`SimpleCommand<State>`** - Inline lambda for `StateOnlyCommand` with a name

### Commands Builder

```java
commands(() -> stateGen) // State-only (no SUT)
commands(() -> stateGen, Sut::new) // State + separate SUT
```

Builder methods:
- `.add(cmd)` / `.add(gen)` / `.add((rs, state) -> cmd)` - Add command with random weight
- `.add(weight, cmd)` - Add command with fixed weight
- `.addIf(predicate, cmd)` - Conditional command (only when predicate is true)
- `.addAllIf(predicate, builder -> {...})` - Multiple conditional commands
- `.preCommands(state -> {...})` - Run before each step
- `.destroyState((state, cause) -> {...})` - Cleanup state after each example
- `.destroySut((sut, cause) -> {...})` - Cleanup SUT after each example
- `.onSuccess((state, sut, history) -> {...})` - Callback on successful example
- `.onFailure((state, sut, history, cause) -> {...})` - Callback on failure
- `.commandsTransformer((state, gen) -> newGen)` - Transform command generator per state
- `.build()` - Produce the `Commands` object

## Patterns

### Pattern 1: State-only with SimpleCommand (most common, simplest)

When the State IS the system under test (model + SUT combined).

```java
import static accord.utils.Property.stateful;
import static accord.utils.Property.commands;

stateful().check(commands(() -> State::new)
.add(MyTest::insertCommand)
.add(MyTest::readCommand)
.addIf(s -> !s.isEmpty(), MyTest::deleteCommand)
.build());

// Command factory method
private static Property.Command<State, Void, ?> insertCommand(RandomSource rs, State state) {
int key = rs.nextInt();
int value = rs.nextInt();
return new Property.SimpleCommand<>(
"Insert(" + key + ", " + value + ")",
s -> {
s.model.put(key, value);
s.sut.put(key, value);
assertThat(s.sut.get(key)).isEqualTo(value);
});
}
```

### Pattern 2: Separate State and SUT with full Command interface

When State (model) and SUT (real implementation) are different objects.

```java
stateful().check(commands(() -> State::new, state -> new Sut(state))
.add((rs, state) -> new Create(nextRange(rs), rs.nextInt()))
.add((rs, state) -> new Read(nextRange(rs)))
.addAllIf(state -> !state.isEmpty(), b -> b
.add((rs, state) -> new Update(rs.pickOrderedSet(state.keys()), rs.nextInt()))
.add((rs, state) -> new Delete(rs.pickOrderedSet(state.keys()))))
.destroyState(State::close)
.destroySut(Sut::close)
.build());

// Full Command with apply/run/checkPostconditions
static class Read implements Command<State, Sut, List<Integer>> {
private final Range range;
Read(Range range) { this.range = range; }

@Override public List<Integer> apply(State state) { return state.search(range); }
@Override public List<Integer> run(Sut sut) { return sut.tree.search(range); }
@Override public void checkPostconditions(State state, List<Integer> expected,
Sut sut, List<Integer> actual) {
Assertions.assertThat(actual).isEqualTo(expected);
}
@Override public String detailed(State state) { return "Read(" + range + ")"; }
}
```

### Pattern 3: Multistep commands

Group multiple commands as an atomic sequence in history.

```java
import static accord.utils.Property.multistep;

Gen<Command<State, Void, ?>> topologyCommand = rs -> multistep(
new SimpleCommand<>("Stop Node", s -> s.stopNode(node)),
new SimpleCommand<>("Replace Host", s -> s.replaceHost(node)),
new SimpleCommand<>("Reconfigure CMS", s -> s.reconfigureCMS())
);
```

### Pattern 4: Using destroyState for final validation

```java
stateful().withExamples(50).withSteps(500).check(commands(() -> State::new)
.add(MyTest::addTable)
.addIf(s -> !s.tables.isEmpty(), MyTest::dropTable)
.destroyState(state -> {
// Run after all steps - finish pending work and validate final state
state.finishPendingSequences();
state.validateFinalSchema();
})
.build());
```

## Error Reporting

On failure, `PropertyError` reports:
- Seed for reproducibility
- Number of examples / steps configured
- The specific failing step number
- State at time of failure (via `toString()`)
- Full command history (numbered, with `detailed()` strings)
- If `withStepTimeout` is used, duration per step is appended to history

## State Design Guidelines

1. **Merged State+SUT** (most common): State holds both the model (e.g., `TreeMap`) and the SUT (e.g., `RangeTree`). Commands operate on both within `applyUnit`.

2. **Separate State and SUT**: Use when the SUT has complex lifecycle (journal, cluster). State owns the model, SUT wraps the real thing. `destroyState`/`destroySut` handle cleanup.

3. **State should implement useful `toString()`**: It appears in error reports.

4. **Commands should have descriptive `detailed(State)`**: These form the history trace shown on failure.

## Imports

```java
import static accord.utils.Property.stateful;
import static accord.utils.Property.commands;
import static accord.utils.Property.multistep;
import static accord.utils.Property.ignoreCommand;
import accord.utils.Property.Command;
import accord.utils.Property.UnitCommand;
import accord.utils.Property.StateOnlyCommand;
import accord.utils.Property.SimpleCommand;
import accord.utils.Property.PreCheckResult;
import accord.utils.Gens;
import accord.utils.Gen;
import accord.utils.RandomSource;
```

## Framework Location

- `modules/accord/accord-core/src/test/java/accord/utils/Property.java` - Core framework
- `modules/accord/accord-core/src/test/java/accord/utils/Gens.java` - Generator utilities
- `modules/accord/accord-core/src/test/java/accord/utils/README.md` - Documentation

## Example: SingleNodeTableWalkTest (complex real-world stateful test)

`test/distributed/org/apache/cassandra/distributed/test/cql3/SingleNodeTableWalkTest.java` is a complex stateful test that exercises CQL read/write paths against random table schemas with random data. It is a good reference for writing non-trivial stateful tests.

Key design patterns demonstrated:
- **State with overridable hooks**: The `State` inner class defines boolean predicates (`supportTokens()`, `allowNonPartitionQuery()`, `allowPartitionQuery()`, etc.) that control which commands are eligible via `addIf`. Subclasses override these hooks to alter behavior without duplicating the test structure.
- **Overridable factory methods**: `createState()`, `createCluster()`, `defineTable()`, `supportedTypes()`, `supportedPrimaryColumnTypes()`, `supportedIndexers()`, and `preCheck()` are all `protected` methods that subclasses override to customize schema generation, cluster topology, and test configuration.
- **Extensive use of `addIf`/`addAllIf`**: Commands are conditionally included based on current state (e.g., only select existing rows when partitions exist, only compact when there are enough SSTables).

The test has a subclass hierarchy that reuses the same stateful structure for different configurations:
- `SingleNodeTableWalkTest` -- single-node, base test
- `MultiNodeTableWalkBase` -- multi-node (overrides `createCluster()`, adjusts consistency)
- `MultiNodeTableWalkWithReadRepairTest` / `MultiNodeTableWalkWithoutReadRepairTest`
- `CasMultiNodeTableWalkBase` -- CAS (Paxos) transactions
- `AccordInteropMultiNodeTableWalkBase` -- Accord transaction interop
- `FullAccordInteropMultiNodeTableWalkTest`, `MixedReadsAccordInteropMultiNodeTableWalkTest`

## Important Notes

- Default examples for `stateful()` is **500** (not 1000 like `qt()`)
- Default steps per example is **1000**
- Commands with `addIf` are only included when the predicate is true for the current state
- **Prefer `addIf` over `ignoreCommand()`**: Use `addIf(predicate, cmd)` to conditionally include commands rather than having command factories return `Property.ignoreCommand()`. The framework fails the test if too many consecutive commands are ignored, making `ignoreCommand()` flaky-prone. `addIf` avoids this by excluding the command from selection entirely when the predicate is false.
- `onSuccess` and `onFailure` callbacks are useful for logging history on completion
- When writing `detailed()`, include the command parameters (key, range, etc.) for debuggability

## Deterministic Execution Requirement

Stateful tests are especially vulnerable to non-deterministic SUTs because failures depend on the exact sequence of states traversed. If the SUT uses internal randomness (e.g., `ThreadLocalRandom`, `Math.random()`), replaying a seed reproduces the same command sequence but the SUT may take a different internal path, making the failure non-reproducible.

**Before writing a stateful test, audit the SUT for internal randomness.** See the `cassandra-testing-property` skill for a full table of common randomness sources and mitigations.

**Key principle**: Every aspect of the test that affects the outcome must be derived from `RandomSource`. This includes:
- Values generated for commands (covered by the command factory receiving `RandomSource rs`)
- The SUT's internal behavior (NOT covered if the SUT uses `ThreadLocalRandom` etc.)
- Any initialization parameters (covered if `State(RandomSource rs)` constructor is used)
- **Iteration order of collections in the SUT** -- if the SUT iterates a `HashMap`/`HashSet` and the order affects behavior, the test is non-reproducible across machines/JVM versions even though no random API is called

Stateful tests are particularly sensitive to iteration order because a different traversal order can change which command's precondition is met, which element gets selected, or which state transition fires -- producing a completely different state sequence from the same seed. For example, if a command iterates `HashMap.entrySet()` to find the first element matching a condition, a different iteration order means a different element is found, leading to divergent state. See the `cassandra-testing-property` skill for a full list of common patterns that break reproducibility.

If the SUT cannot be made deterministic, document the limitation prominently in the test class Javadoc.
60 changes: 60 additions & 0 deletions .agents/skills/cassandra-testing/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
name: cassandra-testing
description: Guide for writing tests in Apache Cassandra. Use whenever writing new tests, adding test coverage, or deciding what testing approach to use for a change. Prioritizes property-based and stateful property testing over hand-written example-based tests.
---

# Cassandra Testing Strategy

When writing tests for Apache Cassandra, prefer property-based testing over hand-written example-based tests. Property tests generate thousands of random inputs and catch edge cases that humans miss.

## Decision Flow

Ask these questions about what you're testing:

### 1. Does it involve sequences of operations on mutable state?

Examples: data structures (trees, indexes, caches), CRUD workflows, schema mutations, topology changes, journal lifecycle (flush/compact/restart).

**Use the `cassandra-testing-stateful` skill** - it provides `Property.stateful()`, which gives you:
- Random command sequences with weighted selection
- Conditional commands (only delete when non-empty)
- Model-based verification (compare SUT against a simple oracle)
- Full command history in failure reports for debugging
- Lifecycle hooks (destroyState, destroySut, onSuccess)

### 2. Does it validate a property/invariant?

Examples: serialization round-trips, algebraic laws (idempotency, commutativity), encoding correctness, range/boundary behavior, type validation.

**Use the `cassandra-testing-property` skill** - it provides `Property.qt()` with typed generators, which gives you:
- `forAll(gen).check(value -> ...)` for clean property assertions
- `Serializers.testSerde()` for automated serialization testing
- Rich generator library (Accord `Gens`, Cassandra `Generators`, `AbstractTypeGenerators`, `CassandraGenerators`)
- Meta-randomness (`mixedDistribution`) for bias-aware testing

### 3. Even "simple" tests benefit from property testing

A test that looks simple often hides complexity. A straightforward `INSERT INTO ... SELECT * FROM` test seems trivial with `(pk int, v int)`, but breaks with `(pk varint, ck frozen<udt>, v int, PRIMARY KEY(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC)`. The schema, types, clustering order, and value expressions all interact in ways no human enumerates completely.

Rather than writing a single example and hoping it covers enough, generate the schema and values randomly:

```
// pseudocode
qt().check(rs -> {
schema = randomSchema(rs) // random types, clustering order, indexes, etc.
row = randomRow(rs, schema) // random values matching the schema
insert(schema, row)
result = select(schema, row.pk)
assertEqual(result, row)
});
```

This approach finds edge cases across type combinations, encoding paths, and query planning that hand-written examples miss entirely.

## Why Property Tests Over Example Tests

- A single `qt().forAll(gen).check(...)` replaces dozens of hand-written test cases
- Random generation finds edge cases humans don't think of (boundary values, empty inputs, overflow, type interactions)
- Meta-randomness varies the distribution per example, catching bugs that require specific patterns (e.g., mostly-deletes followed by a read)
- Failures are reproducible via seed - just add `withOnlySeed(seed)` to replay
- The generator libraries already exist for most Cassandra types - you just compose them
18 changes: 13 additions & 5 deletions src/java/org/apache/cassandra/utils/DynamicList.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@
import java.util.HashMap;
import java.util.TreeSet;
import java.util.concurrent.ThreadLocalRandom;
import java.util.function.IntSupplier;

import com.google.common.annotations.VisibleForTesting;

// simple thread-unsafe skiplist that permits indexing/removal by position, insertion at the end
// (though easily extended to insertion at any position, not necessary here)
Expand Down Expand Up @@ -85,19 +88,26 @@ private Node parent(int parentHeight)
}
}

private final IntSupplier nextInt;
private final int maxHeight;
private final Node<E> head;
private int size;

public DynamicList(int maxExpectedSize)
{
this(maxExpectedSize, () -> ThreadLocalRandom.current().nextInt());
}

public DynamicList(int maxExpectedSize, IntSupplier nextInt)
{
this.nextInt = nextInt;
this.maxHeight = 3 + Math.max(0, (int) Math.ceil(Math.log(maxExpectedSize) / Math.log(2)));
head = new Node<>(maxHeight, null);
}

private int randomLevel()
{
return 1 + Integer.bitCount(ThreadLocalRandom.current().nextInt() & ((1 << (maxHeight - 1)) - 1));
return 1 + Integer.bitCount(nextInt.getAsInt() & ((1 << (maxHeight - 1)) - 1));
}

public Node<E> append(E value)
Expand Down Expand Up @@ -198,10 +208,8 @@ public int size()
return size;
}

// some quick and dirty tests to confirm the skiplist works as intended
// don't create a separate unit test - tools tree doesn't currently warrant them

private boolean isWellFormed()
@VisibleForTesting
boolean isWellFormed()
{
for (int i = 0 ; i < maxHeight ; i++)
{
Expand Down
7 changes: 6 additions & 1 deletion src/java/org/apache/cassandra/utils/Hex.java
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,12 @@ public static long parseLong(String hex, int start, int end)
for (int i = start ; i < end ; ++i)
{
char c = hex.charAt(i);
result |= (long)(c - (c >= 'a' ? 'a' - 10 : '0')) << shift;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex includes upper case but this method doesn't handle that. This is fine in existing usage as we use Long.toHexString(lsb); which explicitly documents that its lower case only (if you want upper case it says to do toUpperCase).

So this bug fix isn't for any existing logic, but more defensive to make sure future logic works correctly

if (c >= charToByte.length)
throw new NumberFormatException("Invalid hex character '" + c + "' at index " + i + " in: " + hex.substring(start, end));
int digit = charToByte[c];
if (digit < 0)
throw new NumberFormatException("Invalid hex character '" + c + "' at index " + i + " in: " + hex.substring(start, end));
result |= (long) digit << shift;
shift -= 4;
}
return result;
Expand Down
Loading