diff --git a/.agents/skills/cassandra-testing-property/SKILL.md b/.agents/skills/cassandra-testing-property/SKILL.md new file mode 100644 index 000000000000..6442a8655283 --- /dev/null +++ b/.agents/skills/cassandra-testing-property/SKILL.md @@ -0,0 +1,448 @@ +--- +name: cassandra-testing-property +description: Write property-based tests for Apache Cassandra using the Property.qt() framework. Use when implementing new tests, validating invariants, testing serialization round-trips, or verifying algebraic properties of code. +--- + +# Property-Based Testing with `qt()` + +Write property-based tests for Apache Cassandra using the `Property.qt()` framework. This framework lives in the Accord module but is used extensively across both Accord and Cassandra test suites. + +## When to Use + +- Writing tests that validate properties/invariants hold across random inputs +- Testing serialization/deserialization round-trips +- Verifying algebraic laws (idempotency, commutativity, associativity) +- Testing with randomly-generated complex types (tables, keyspaces, types) +- Replacing hand-written example-based tests with broader coverage + +## Key Concepts + +### Anatomy of a Property Test + +Every property test has: +1. **Source of Randomness** - A seed-based `RandomSource` for reproducibility +2. **One or more Properties** - Assertions that must hold for all generated values +3. **Multiple Examples** - Default 1000 iterations per test +4. **Reproducibility** - Seeds are reported on failure for exact replay +5. **Data Generation** - `Gen` instances produce random values + +### Entry Point: `qt()` + +```java +import static accord.utils.Property.qt; +``` + +Returns a `ForBuilder` for configuring and running property tests. + +### Configuration Methods + +| Method | Purpose | Default | +|--------|---------|---------| +| `withSeed(long)` | Pin seed for reproducing a failure | random | +| `withExamples(int)` | Number of test iterations | 1000 | +| `withPure(boolean)` | Fresh seed per example (reproducible) | true | +| `withTimeout(Duration)` | Wall-clock timeout for the entire test | none | + +### Test Input Generation + +- `qt().check(rs -> {...})` - Raw `RandomSource`, drive generation inline +- `qt().forAll(Gen).check(value -> {...})` - Single typed generator +- `qt().forAll(Gen, Gen).check((a, b) -> {...})` - Two generators +- `qt().forAll(Gen, Gen, Gen).check((a, b, c) -> {...})` - Three generators + +### Generators (`Gen`) + +There are **two generator systems** in Cassandra's test code: + +1. **Accord generators** (`accord.utils.Gen`) - The new system, used with `Property.qt()`. Interface: `T next(RandomSource rs)` +2. **QuickTheories generators** (`org.quicktheories.core.Gen`) - The old system. Interface: `T generate(RandomnessSource rs)` + +Most Cassandra-specific generators in `test/unit/` still use the QuickTheories `Gen` type for historical reasons, but are fully usable with `Property.qt()` via bridge functions. + +#### Bridge between old and new generators + +In `test/unit/org/apache/cassandra/utils/Generators.java`: + +```java +// Convert QuickTheories Gen to Accord Gen (for use in qt().forAll(...)) +accord.utils.Gen accordGen = Generators.toGen(quickTheoriesGen); + +// Convert Accord Gen to QuickTheories Gen (for use with old APIs) +org.quicktheories.core.Gen qtGen = Generators.fromGen(accordGen); +``` + +#### Accord `Gens` utility class (`accord.utils.Gens`) + +Common generators: +- `Gens.ints().between(lo, hi)` / `Gens.ints().all()` +- `Gens.longs().between(lo, hi)` / `Gens.longs().all()` +- `Gens.bools().all()` +- `Gens.enums().all(MyEnum.class)` +- `Gens.strings().all().ofLengthBetween(min, max)` +- `Gens.lists(itemGen).ofSizeBetween(min, max)` +- `Gens.lists(itemGen).unique().ofSize(n)` +- `Gens.arrays(Class, itemGen).ofSizeBetween(min, max)` +- `Gens.oneOf(value1, value2, ...)` - Pick from fixed set +- `Gens.constant(value)` - Always same value +- `Gens.pick(collection)` - Pick random element from collection +- `Gens.pick(Map)` - Weighted random pick +- `Gens.random()` - A `Gen` (the RandomSource itself) + +Meta-randomness (generators of generators): +- `Gens.enums().allMixedDistribution(MyEnum.class)` - Returns `Gen>` +- `Gens.mixedDistribution(int min, int max)` - Returns `Gen` for int ranges +- `Gens.mixedDistribution(long min, long max)` - Returns `Gen` for long ranges +- `Gens.mixedDistribution(T... list)` / `Gens.mixedDistribution(List list)` - Returns `Gen>` for selecting from a collection +- `Gens.mixedDistribution(int[] list)` - Returns `Gen` for selecting from an int array +- `Gens.bools().mixedDistribution()` - Returns `Gen>` + +Each call to the outer `Gen` picks a *distribution strategy* (uniform, median-biased, zipfian, random-weight), then returns an inner `Gen` that selects values using that strategy. This means **each test example exercises a different distribution**, dramatically improving coverage. + +**Why meta-randomness matters:** Without it, uniform random selection is unlikely to expose bugs that require specific patterns (e.g., mostly-deletes, heavily-skewed key access, or clustered ranges). A single test run with 500 examples will exercise ~500 different distributions, some of which will naturally create adversarial patterns that would take millions of uniform random iterations to hit by chance. See the "Role of Bias" section below for more details. + +Composing generators: +```java +Gen stringGen = Gens.ints().between(0, 10).map(i -> "Number: " + i); +Gen evenNumbers = Gens.ints().between(0, 100).filter(i -> i % 2 == 0); +Gen> distro = rs -> AccordGens.waitingOn(...); +Gen flattened = Gens.flatten(distro); // flatten Gen> to Gen +``` + +#### Cassandra Accord-style generators (`test/unit/org/apache/cassandra/utils/AccordGenerators.java`) + +Generators using Accord's `Gen` type, directly usable with `Property.qt()` (no bridge needed): +- `AccordGenerators.byteArray(Gen.IntGen sizeGen)` - Random `byte[]` with size from generator; fills 8 bytes at a time via `nextLong()` for efficiency +- `AccordGenerators.byteArrayOfSize(int size)` - Random `byte[]` of fixed size +- `AccordGenerators.keys()` / `AccordGenerators.keys(partitioner)` - `PartitionKey` generators +- `AccordGenerators.routingKeysGen(partitioner)` - `TokenKey` generators +- `AccordGenerators.range(partitioner)` - `Range` generators +- `AccordGenerators.ranges(partitioner)` - `Ranges` generators +- `AccordGenerators.commands()` - Random `Command` objects +- `AccordGenerators.topologyGen(partitioner)` - Random `Topology` objects + +#### Cassandra generators (`test/unit/org/apache/cassandra/utils/Generators.java`) + +General-purpose generators (QuickTheories `Gen` type - use `Generators.toGen()` to convert): +- `Generators.IDENTIFIER_GEN` - Valid CQL identifiers (1-50 chars, regex word) +- `Generators.SYMBOL_GEN` - Identifiers that aren't reserved keywords +- `Generators.DNS_DOMAIN_NAME` - Valid DNS domain names +- `Generators.UTF_8_GEN` - Valid UTF-8 strings (0-1024 chars) +- `Generators.UUID_RANDOM_GEN` - Version 4 UUIDs +- `Generators.UUID_TIME_GEN` - Version 1 (time) UUIDs +- `Generators.INET_ADDRESS_GEN` - Mixed IPv4/IPv6 addresses +- `Generators.INET_4_ADDRESS_GEN` / `Generators.INET_6_ADDRESS_GEN` +- `Generators.TIMESTAMP_GEN` - Timestamps within +/- 50 years of 2020 +- `Generators.DATE_GEN` - Java `Date` objects +- `Generators.TIMESTAMP_NANOS` - Nanosecond timestamps +- `Generators.SMALL_TIME_SPAN_NANOS` / `Generators.TINY_TIME_SPAN_NANOS` +- `Generators.bytes(min, max)` - Random `ByteBuffer` (heap) +- `Generators.bytesAnyType(min, max)` - Random `ByteBuffer` (heap/direct/read-only) +- `Generators.bigInt()` / `Generators.bigDecimal()` +- `Generators.timeUUID()` - `TimeUUID` objects +- `Generators.list(gen, sizeGen)` / `Generators.set(gen, sizeGen)` / `Generators.uniqueList(gen, sizeGen)` + +#### Type generators (`test/unit/org/apache/cassandra/utils/AbstractTypeGenerators.java`) + +For generating CQL types and values for those types: +- `AbstractTypeGenerators.typeGen()` - All `AbstractType` implementations (depth 3) +- `AbstractTypeGenerators.safeTypeGen()` - Types safe for byte-comparable round-trips +- `AbstractTypeGenerators.primitiveTypeGen()` - Only primitive types +- `AbstractTypeGenerators.getTypeSupport(type)` - Returns a `TypeSupport` with a value generator + comparator for any type +- `AbstractTypeGenerators.builder()` - Fine-grained `TypeGenBuilder`: + ```java + AbstractTypeGenerators.builder() + .withoutEmpty() + .withoutPrimitive(DurationType.instance) + .withMaxDepth(1) + .withDefaultSizeGen(2) + .withTypeKinds(TypeKind.PRIMITIVE, TypeKind.LIST, TypeKind.MAP) + .build(); + ``` +- `AbstractTypeGenerators.withoutUnsafeEquality()` - Remove types with problematic equality (Empty, Duration, Decimal, Counter) + +#### Cassandra domain generators (`test/unit/org/apache/cassandra/utils/CassandraGenerators.java`) + +Higher-level Cassandra objects (QuickTheories `Gen` type): +- `CassandraGenerators.TABLE_METADATA_GEN` - Random table schemas +- `CassandraGenerators.regularTable()` - Builder for table metadata: + ```java + CassandraGenerators.regularTable() + .withKeyspaceName("ks") + .withSimpleColumnNames() + .withPartitionColumnsCount(1) + .withClusteringColumnsBetween(0, 3) + .withRegularColumnsBetween(1, 5) + .withKnownMemtables() + .build(); + ``` +- `CassandraGenerators.regularKeyspace()` - Builder for keyspace metadata +- `CassandraGenerators.INET_ADDRESS_AND_PORT_GEN` - `InetAddressAndPort` +- `CassandraGenerators.TABLE_ID_GEN` - `TableId` +- `CassandraGenerators.CLUSTERING_GEN` - `Clustering` +- `CassandraGenerators.MESSAGE_GEN` - Internode `Message` +- `CassandraGenerators.compactionParamsGen()` / `compressionParamsGen()` / `cachingParamsGen()` +- `CassandraGenerators.token(partitioner)` - Tokens for a given partitioner +- `CassandraGenerators.partitionKeyDataGen(metadata)` - Random partition key bytes for a table + +### Reproducing Failures + +When a property test fails, the `PropertyError` message includes a `Seed = `. To reproduce: + +```java +// Add .withOnlySeed() to replay just the failing case (sets examples=1) +qt().withOnlySeed(3448125481938895569L).forAll(Gens.ints().all()).check(value -> { + // will reproduce the exact same failing input, nothing more +}); +``` + +- Use `withOnlySeed(seed)` (not `withSeed(seed)`) - it limits examples to 1 so you only run the failing case rather than continuing with more random seeds after +- If `Pure = true` (default): the seed fully determines the failing input - reproduction is guaranteed +- If `Pure = false`: side effects may prevent exact reproduction (e.g., timing, external state). In these cases prefer `withSeed` as the actual failing seed might not be the one listed, but one of the later examples +- The `Values:` section in the error shows the specific generated inputs that triggered the failure - use these to understand the root cause +- When filing a bug or writing a regression test, always include the seed + +## Patterns + +### Pattern 1: Basic property with raw RandomSource +```java +qt().check(rs -> { + int value = rs.nextInt(1, 100); + assertThat(value / value).isEqualTo(1); +}); +``` + +### Pattern 2: Single generator with forAll +```java +qt().forAll(Gens.ints().all()).check(value -> { + assertThat(value / value).isEqualTo(1); +}); +``` + +### Pattern 3: Serialization round-trip + +For most Cassandra serializers, use `org.apache.cassandra.io.Serializers.testSerde()` which automates the full round-trip: +- Serializes the input +- Verifies `serializedSize()` matches actual bytes written +- Deserializes and checks equality (using `ReflectionUtils.recursiveEquals` for diff reporting) +- Verifies `skip()` consumes exactly the right number of bytes +- Tests `ByteBuffer`-based serialize/deserialize methods if overridden + +You only need to provide a generator for random input and the serializer: + +```java +// Shared buffer across examples for efficiency +@SuppressWarnings({ "resource", "IOResourceOpenedButNotSafelyClosed" }) +DataOutputBuffer output = new DataOutputBuffer(); + +qt().forAll(myObjectGen).check(obj -> { + // Unversioned serializer + Serializers.testSerde(output, MyObject.serializer, obj); +}); + +qt().forAll(myObjectGen).check(obj -> { + // Versioned serializer - test across all supported versions + for (MessagingService.Version version : MessagingService.Version.supportedVersions()) + Serializers.testSerde(output, MyObject.serializer, obj, version.value); +}); + +qt().forAll(myObjectGen).check(obj -> { + // Parameterised serializer + Serializers.testSerde(output, MyObject.serializer, obj, param); +}); +``` + +The `Serializers` utility supports: +- `AsymmetricUnversionedSerializer` - basic unversioned +- `IVersionedAsymmetricSerializer` - versioned with `int` version +- `AsymmetricVersionedSerializer` - versioned with typed version +- `ParameterisedUnversionedSerializer` - with extra parameter +- `ParameterisedVersionedSerializer` - parameterised + versioned + +For non-standard serde (e.g., `ByteComparable`), write the round-trip manually: + +```java +qt().forAll(tokenGen).check(token -> { + ByteComparable comparable = asComparableBytes(token); + Token read = fromComparableBytes(token.getPartitioner(), comparable); + assertThat(read).isEqualTo(token); +}); +``` + +### Pattern 4: Algebraic laws (merge idempotency, commutativity, associativity) +```java +// Idempotency +qt().forAll(snapshotGen).check(s -> s.merge(s).equals(s)); +// Commutativity +qt().forAll(snapshotGen, snapshotGen).check((a, b) -> a.merge(b).equals(b.merge(a))); +// Associativity +qt().forAll(snapshotGen, snapshotGen, snapshotGen) + .check((a, b, c) -> a.merge(b).merge(c).equals(a.merge(b.merge(c)))); +``` + +### Pattern 5: Impure test (shared state across examples) + +By default (`pure=true`), each example gets a fresh seed derived from the previous - making every single example independently reproducible. Use `withPure(false)` when examples share mutable state that accumulates across iterations, making individual re-seeding meaningless: + +```java +qt().withPure(false).withExamples(10).check(rs -> { + // Cluster persists across examples - state from example N affects example N+1 + Cluster.Node coordinator = cluster.nextCoordinator(rs); + RepairCoordinator repair = coordinator.repair(repairOption(rs, coordinator)); + repair.run(); + cluster.processAll(); + assertSuccess(repair); +}); +``` + +When `pure=false`, the seed in a failure report may not fully reproduce the issue since it depends on accumulated side effects from prior examples. + +### Pattern 6: Meta-randomness (distribution varies per example) +```java +Gen> ACTION_DISTRO = Gens.enums().allMixedDistribution(Action.class); +qt().check(rs -> { + Gen gen = ACTION_DISTRO.next(rs); // pick distribution for this example + for (int i = 0; i < 1000; i++) { + Action action = gen.next(rs); + // each example exercises a different distribution of actions + } +}); +``` + +### Pattern 7: Timeout-bounded +```java +qt().withTimeout(Duration.ofSeconds(60)).check(rs -> { + // Long-running fuzz test bounded by wall clock +}); +``` + +### Pattern 8: Model-based validation + +For simple cases, you can inline a model comparison loop directly in `qt().check()`: + +```java +qt().check(rs -> { + TreeMap model = new TreeMap<>(); + MyDataStructure sut = new MyDataStructure(); + for (int i = 0; i < 1000; i++) { + int key = rs.nextInt(); + int value = rs.nextInt(); + model.put(key, value); + sut.put(key, value); + assertThat(sut.get(key)).isEqualTo(model.get(key)); + } +}); +``` + +If you need multiple operation types (create/read/update/delete), conditional commands, or better failure diagnostics (command history, step-level reporting), consider using `Property.stateful()` instead - see the **cassandra-testing-stateful** skill which is purpose-built for model-based stateful testing. + +## Role of Bias (Why Meta-Randomness is Critical) + +Property tests with **uniform random distribution** are unlikely to find many real bugs because: + +1. **Most types have large domains** - If your int range is [0, 1M], the chance of hitting 0 (a common edge case) is 1/1,000,000 per attempt +2. **Edge cases cluster at boundaries** - Off-by-one errors only manifest at specific values +3. **Relationships between operations matter** - A bug might only appear when you do 90% deletes and 10% inserts, or when the same key is accessed repeatedly +4. **Data structure behavior changes with shape** - A B-tree behaves differently when heavily left-skewed vs balanced + +**Meta-randomness solves this** by varying the distribution *itself* across test examples: + +```java +// BAD: Uniform - every action equally likely every time +Gen actionGen = Gens.enums().all(Action.class); + +// GOOD: Meta-random - some examples will be 90% Create, others 90% Delete, etc. +Gen> actionDistro = Gens.enums().allMixedDistribution(Action.class); +qt().check(rs -> { + Gen gen = actionDistro.next(rs); // pick distribution for THIS example + // now use gen.next(rs) for each step... +}); +``` + +The available distribution strategies include: +- **Uniform** - Equal probability (the baseline) +- **Median-biased** - Values cluster around a randomly-chosen median +- **Zipfian** - Power-law distribution (a few values dominate, most are rare) +- **Random-weight** - Each value gets a random weight, creating arbitrary skew + +This applies to numeric ranges too: +```java +// For selecting from a range with varying distribution +Gen sizeDistro = Gens.mixedDistribution(1, 10000); +qt().check(rs -> { + Gen.IntGen sizeGen = sizeDistro.next(rs); + // some examples will mostly pick small sizes, others large, others clustered around a median + for (int i = 0; i < 100; i++) { + int size = sizeGen.nextInt(rs); + // ... + } +}); +``` + +**Real example**: The `StatefulRangeTreeTest` bug was only found when bias caused a specific sequence (Create, Clear, Create, RangeRead) - with uniform distribution, Clear is equally weighted with other operations, making it extremely rare to see Clear followed immediately by Create. A zipfian or heavily-skewed distribution naturally produces these adversarial sequences. + +## Imports + +```java +import static accord.utils.Property.qt; +import accord.utils.Gens; +import accord.utils.Gen; +import accord.utils.RandomSource; +``` + +## Key File Locations + +- `modules/accord/accord-core/src/test/java/accord/utils/Property.java` - Core `qt()` / `stateful()` framework +- `modules/accord/accord-core/src/test/java/accord/utils/Gens.java` - Accord generator utilities +- `test/unit/org/apache/cassandra/utils/AccordGenerators.java` - Cassandra-specific Accord-style generators (byte arrays, keys, ranges, topologies) - use directly with `qt().forAll()` +- `test/unit/org/apache/cassandra/utils/Generators.java` - General-purpose generators (UUID, InetAddress, DNS, timestamps, bytes, BigInteger, etc.) + `toGen()`/`fromGen()` bridge +- `test/unit/org/apache/cassandra/utils/AbstractTypeGenerators.java` - CQL type generators + `getTypeSupport()` for generating values of any type +- `test/unit/org/apache/cassandra/utils/CassandraGenerators.java` - High-level domain generators (TableMetadata, KeyspaceMetadata, Messages, Tokens, compaction/compression params) +- `test/unit/org/apache/cassandra/io/Serializers.java` - Automated serde test utility + +## Important Notes + +- Some older tests use `org.quicktheories.QuickTheory.qt` (a different library with different API) - **new tests should use the Accord `Property.qt()`** +- When using QuickTheories generators with `Property.qt()`, wrap them with `Generators.toGen(qtGen)` +- Always write properties that are **true for all valid inputs**, not just specific examples +- When a test fails, always include the seed in bug reports so it can be replayed with `withSeed()` + +## Deterministic Execution Requirement + +**All randomness in a property test must flow through `RandomSource` for failures to be reproducible via seed.** + +Before writing a property test, audit the system under test for internal randomness sources: + +| Internal randomness source | Impact | Mitigation | +|---|---|---| +| `ThreadLocalRandom` | Seed replay won't reproduce same SUT behavior | Refactor SUT to accept random source, or document limitation | +| `Math.random()` | Same as above | Same as above | +| `System.nanoTime()` / `System.currentTimeMillis()` | Timing-dependent behavior varies across runs | Mock time source or accept non-reproducibility | +| `UUID.randomUUID()` | Different UUIDs each run | Generate UUIDs from `RandomSource` | +| `Collections.shuffle(list)` | Uses default `Random` | Use `Collections.shuffle(list, random)` with seeded random | +| **Non-deterministic iteration order** | `HashMap`, `HashSet`, `IdentityHashMap` iteration order varies across JVM versions, boxes, and even runs | Use `LinkedHashMap`/`LinkedHashSet` (insertion order), `TreeMap`/`TreeSet` (sorted order), or sort before iterating | + +**How to check**: Do NOT rely on re-running the test on the same machine to verify determinism. Sources like `ThreadLocalRandom` often produce the same sequence within the same JVM process or even across restarts on the same box/JVM version, giving a false sense of reproducibility. The non-determinism only surfaces when running on a different machine, a different JVM version, or under different thread scheduling -- exactly the conditions of a CI failure that someone else needs to debug. + +Instead, **audit the source code** of the SUT for uses of `ThreadLocalRandom`, `Math.random()`, `Random()` (unseeded), `UUID.randomUUID()`, `System.nanoTime()`, or any other randomness source that does not flow from the test's `RandomSource`. Grep for these in the SUT class and its transitive dependencies. Also check for non-deterministic iteration order (see below). + +**When the SUT cannot be made deterministic** (e.g., it uses `ThreadLocalRandom` internally and refactoring is too invasive): +1. Add a comment explaining which internal randomness is not controlled +2. Consider using `withPure(false)` since per-example reproducibility is already limited +3. Document this in the test's Javadoc so future debuggers know seed replay may not fully reproduce failures + +**Example of the problem**: `DynamicList.randomLevel()` uses `ThreadLocalRandom` to determine skip-list level assignments. A property test for `DynamicList` can reproduce the *sequence of operations* from a seed, but not the internal structure of the skip list, because level assignments are non-deterministic. A failure caused by a specific level assignment pattern cannot be replayed. + +### Non-deterministic iteration order (subtle) + +Collections with unstable iteration order are a particularly insidious source of non-reproducibility because they don't involve any explicit randomness API. `HashMap` and `HashSet` iteration order depends on internal hash table sizing, which can change between JVM versions, machines, or even runs with different heap sizes. If the SUT iterates over a `HashMap` and the iteration order affects behavior (e.g., which element gets processed first, tie-breaking, output ordering), the test becomes non-reproducible across environments even though no random API is called. + +Common patterns that break reproducibility: +- `for (var entry : hashMap.entrySet())` where processing order matters +- `hashSet.stream().findFirst()` -- which element is "first" is undefined +- `new ArrayList<>(hashMap.values())` -- list order depends on iteration order +- Building output by iterating a `HashMap` and concatenating/accumulating results +- Any algorithm where the result depends on which element is visited first (e.g., selecting a "best" candidate from a map by iterating and comparing) + +This is especially dangerous because it often works on the developer's machine and only breaks in CI or on a colleague's box. Unlike `ThreadLocalRandom`, there is no API call to grep for -- you need to understand whether iteration order affects the SUT's observable behavior. + +If you can not change the data structures to be iteration safe, export the iteration to a list and add deterministic sorting to solve the problem. diff --git a/.agents/skills/cassandra-testing-stateful/SKILL.md b/.agents/skills/cassandra-testing-stateful/SKILL.md new file mode 100644 index 000000000000..5f516a0ddf2d --- /dev/null +++ b/.agents/skills/cassandra-testing-stateful/SKILL.md @@ -0,0 +1,262 @@ +--- +name: cassandra-testing-stateful +description: Write stateful property-based tests for Apache Cassandra using the Property.stateful() framework. Use when verifying systems behave correctly through sequences of operations, modeling interactions with stateful systems, or testing CRUD/state-machine correctness with random command sequences. +--- + +# Stateful Property-Based Testing with `stateful()` + +Write stateful property-based tests for Apache Cassandra using the `Property.stateful()` framework. This is the "CRUD testing" or "model-based testing" approach where random command sequences exercise state machines and verify invariants hold throughout. + +## When to Use + +- Testing stateful data structures (trees, indexes, caches) against a simple model +- Verifying CRUD operations maintain consistency +- Testing multi-step sequences (topology changes, schema mutations, journal operations) +- Testing systems that survive restart/flush/compact cycles +- Any scenario where you need to verify a sequence of operations, not just individual properties + +## Key Concepts + +### Entry Point: `stateful()` + +```java +import static accord.utils.Property.stateful; +import static accord.utils.Property.commands; +``` + +Returns a `StatefulBuilder` for configuring and running stateful property tests. + +### Configuration + +Inherits from `qt()` plus: + +| Method | Purpose | Default | +|--------|---------|---------| +| `withExamples(int)` | Number of full test runs | 500 | +| `withSteps(int)` | Max commands per test run | 1000 | +| `withStepTimeout(Duration)` | Timeout per individual step | none | +| `withSeed(long)` | Pin seed for reproducibility | random | +| `withPure(boolean)` | Fresh seed per example | true | +| `withTimeout(Duration)` | Timeout for entire example | none | + +### Core Types + +- **`State`** - Model/tracking state for the test (e.g., a `TreeMap` as oracle) +- **`SystemUnderTest`** (SUT) - The actual implementation being tested (optional - often `Void` when State acts as both) +- **`Command`** - A single operation applied to both model and SUT + +### Command Interface + +```java +interface Command { + // Guard: should this command execute given current state? + default PreCheckResult checkPreconditions(State state) { return PreCheckResult.Ok; } + // Apply to the model - return expected result + Result apply(State state) throws Throwable; + // Apply to the real system - return actual result + Result run(SystemUnderTest sut) throws Throwable; + // Verify model and SUT agree + default void checkPostconditions(State state, Result expected, + SystemUnderTest sut, Result actual) throws Throwable {} + // Human-readable description for history logging + default String detailed(State state) { return this.toString(); } +} +``` + +Convenience interfaces: +- **`UnitCommand`** - When apply/run return void (use `applyUnit`/`runUnit`) +- **`StateOnlyCommand`** - When there's no separate SUT (extends `UnitCommand`) +- **`SimpleCommand`** - Inline lambda for `StateOnlyCommand` with a name + +### Commands Builder + +```java +commands(() -> stateGen) // State-only (no SUT) +commands(() -> stateGen, Sut::new) // State + separate SUT +``` + +Builder methods: +- `.add(cmd)` / `.add(gen)` / `.add((rs, state) -> cmd)` - Add command with random weight +- `.add(weight, cmd)` - Add command with fixed weight +- `.addIf(predicate, cmd)` - Conditional command (only when predicate is true) +- `.addAllIf(predicate, builder -> {...})` - Multiple conditional commands +- `.preCommands(state -> {...})` - Run before each step +- `.destroyState((state, cause) -> {...})` - Cleanup state after each example +- `.destroySut((sut, cause) -> {...})` - Cleanup SUT after each example +- `.onSuccess((state, sut, history) -> {...})` - Callback on successful example +- `.onFailure((state, sut, history, cause) -> {...})` - Callback on failure +- `.commandsTransformer((state, gen) -> newGen)` - Transform command generator per state +- `.build()` - Produce the `Commands` object + +## Patterns + +### Pattern 1: State-only with SimpleCommand (most common, simplest) + +When the State IS the system under test (model + SUT combined). + +```java +import static accord.utils.Property.stateful; +import static accord.utils.Property.commands; + +stateful().check(commands(() -> State::new) + .add(MyTest::insertCommand) + .add(MyTest::readCommand) + .addIf(s -> !s.isEmpty(), MyTest::deleteCommand) + .build()); + +// Command factory method +private static Property.Command insertCommand(RandomSource rs, State state) { + int key = rs.nextInt(); + int value = rs.nextInt(); + return new Property.SimpleCommand<>( + "Insert(" + key + ", " + value + ")", + s -> { + s.model.put(key, value); + s.sut.put(key, value); + assertThat(s.sut.get(key)).isEqualTo(value); + }); +} +``` + +### Pattern 2: Separate State and SUT with full Command interface + +When State (model) and SUT (real implementation) are different objects. + +```java +stateful().check(commands(() -> State::new, state -> new Sut(state)) + .add((rs, state) -> new Create(nextRange(rs), rs.nextInt())) + .add((rs, state) -> new Read(nextRange(rs))) + .addAllIf(state -> !state.isEmpty(), b -> b + .add((rs, state) -> new Update(rs.pickOrderedSet(state.keys()), rs.nextInt())) + .add((rs, state) -> new Delete(rs.pickOrderedSet(state.keys())))) + .destroyState(State::close) + .destroySut(Sut::close) + .build()); + +// Full Command with apply/run/checkPostconditions +static class Read implements Command> { + private final Range range; + Read(Range range) { this.range = range; } + + @Override public List apply(State state) { return state.search(range); } + @Override public List run(Sut sut) { return sut.tree.search(range); } + @Override public void checkPostconditions(State state, List expected, + Sut sut, List actual) { + Assertions.assertThat(actual).isEqualTo(expected); + } + @Override public String detailed(State state) { return "Read(" + range + ")"; } +} +``` + +### Pattern 3: Multistep commands + +Group multiple commands as an atomic sequence in history. + +```java +import static accord.utils.Property.multistep; + +Gen> topologyCommand = rs -> multistep( + new SimpleCommand<>("Stop Node", s -> s.stopNode(node)), + new SimpleCommand<>("Replace Host", s -> s.replaceHost(node)), + new SimpleCommand<>("Reconfigure CMS", s -> s.reconfigureCMS()) +); +``` + +### Pattern 4: Using destroyState for final validation + +```java +stateful().withExamples(50).withSteps(500).check(commands(() -> State::new) + .add(MyTest::addTable) + .addIf(s -> !s.tables.isEmpty(), MyTest::dropTable) + .destroyState(state -> { + // Run after all steps - finish pending work and validate final state + state.finishPendingSequences(); + state.validateFinalSchema(); + }) + .build()); +``` + +## Error Reporting + +On failure, `PropertyError` reports: +- Seed for reproducibility +- Number of examples / steps configured +- The specific failing step number +- State at time of failure (via `toString()`) +- Full command history (numbered, with `detailed()` strings) +- If `withStepTimeout` is used, duration per step is appended to history + +## State Design Guidelines + +1. **Merged State+SUT** (most common): State holds both the model (e.g., `TreeMap`) and the SUT (e.g., `RangeTree`). Commands operate on both within `applyUnit`. + +2. **Separate State and SUT**: Use when the SUT has complex lifecycle (journal, cluster). State owns the model, SUT wraps the real thing. `destroyState`/`destroySut` handle cleanup. + +3. **State should implement useful `toString()`**: It appears in error reports. + +4. **Commands should have descriptive `detailed(State)`**: These form the history trace shown on failure. + +## Imports + +```java +import static accord.utils.Property.stateful; +import static accord.utils.Property.commands; +import static accord.utils.Property.multistep; +import static accord.utils.Property.ignoreCommand; +import accord.utils.Property.Command; +import accord.utils.Property.UnitCommand; +import accord.utils.Property.StateOnlyCommand; +import accord.utils.Property.SimpleCommand; +import accord.utils.Property.PreCheckResult; +import accord.utils.Gens; +import accord.utils.Gen; +import accord.utils.RandomSource; +``` + +## Framework Location + +- `modules/accord/accord-core/src/test/java/accord/utils/Property.java` - Core framework +- `modules/accord/accord-core/src/test/java/accord/utils/Gens.java` - Generator utilities +- `modules/accord/accord-core/src/test/java/accord/utils/README.md` - Documentation + +## Example: SingleNodeTableWalkTest (complex real-world stateful test) + +`test/distributed/org/apache/cassandra/distributed/test/cql3/SingleNodeTableWalkTest.java` is a complex stateful test that exercises CQL read/write paths against random table schemas with random data. It is a good reference for writing non-trivial stateful tests. + +Key design patterns demonstrated: +- **State with overridable hooks**: The `State` inner class defines boolean predicates (`supportTokens()`, `allowNonPartitionQuery()`, `allowPartitionQuery()`, etc.) that control which commands are eligible via `addIf`. Subclasses override these hooks to alter behavior without duplicating the test structure. +- **Overridable factory methods**: `createState()`, `createCluster()`, `defineTable()`, `supportedTypes()`, `supportedPrimaryColumnTypes()`, `supportedIndexers()`, and `preCheck()` are all `protected` methods that subclasses override to customize schema generation, cluster topology, and test configuration. +- **Extensive use of `addIf`/`addAllIf`**: Commands are conditionally included based on current state (e.g., only select existing rows when partitions exist, only compact when there are enough SSTables). + +The test has a subclass hierarchy that reuses the same stateful structure for different configurations: +- `SingleNodeTableWalkTest` -- single-node, base test +- `MultiNodeTableWalkBase` -- multi-node (overrides `createCluster()`, adjusts consistency) + - `MultiNodeTableWalkWithReadRepairTest` / `MultiNodeTableWalkWithoutReadRepairTest` + - `CasMultiNodeTableWalkBase` -- CAS (Paxos) transactions + - `AccordInteropMultiNodeTableWalkBase` -- Accord transaction interop + - `FullAccordInteropMultiNodeTableWalkTest`, `MixedReadsAccordInteropMultiNodeTableWalkTest` + +## Important Notes + +- Default examples for `stateful()` is **500** (not 1000 like `qt()`) +- Default steps per example is **1000** +- Commands with `addIf` are only included when the predicate is true for the current state +- **Prefer `addIf` over `ignoreCommand()`**: Use `addIf(predicate, cmd)` to conditionally include commands rather than having command factories return `Property.ignoreCommand()`. The framework fails the test if too many consecutive commands are ignored, making `ignoreCommand()` flaky-prone. `addIf` avoids this by excluding the command from selection entirely when the predicate is false. +- `onSuccess` and `onFailure` callbacks are useful for logging history on completion +- When writing `detailed()`, include the command parameters (key, range, etc.) for debuggability + +## Deterministic Execution Requirement + +Stateful tests are especially vulnerable to non-deterministic SUTs because failures depend on the exact sequence of states traversed. If the SUT uses internal randomness (e.g., `ThreadLocalRandom`, `Math.random()`), replaying a seed reproduces the same command sequence but the SUT may take a different internal path, making the failure non-reproducible. + +**Before writing a stateful test, audit the SUT for internal randomness.** See the `cassandra-testing-property` skill for a full table of common randomness sources and mitigations. + +**Key principle**: Every aspect of the test that affects the outcome must be derived from `RandomSource`. This includes: +- Values generated for commands (covered by the command factory receiving `RandomSource rs`) +- The SUT's internal behavior (NOT covered if the SUT uses `ThreadLocalRandom` etc.) +- Any initialization parameters (covered if `State(RandomSource rs)` constructor is used) +- **Iteration order of collections in the SUT** -- if the SUT iterates a `HashMap`/`HashSet` and the order affects behavior, the test is non-reproducible across machines/JVM versions even though no random API is called + +Stateful tests are particularly sensitive to iteration order because a different traversal order can change which command's precondition is met, which element gets selected, or which state transition fires -- producing a completely different state sequence from the same seed. For example, if a command iterates `HashMap.entrySet()` to find the first element matching a condition, a different iteration order means a different element is found, leading to divergent state. See the `cassandra-testing-property` skill for a full list of common patterns that break reproducibility. + +If the SUT cannot be made deterministic, document the limitation prominently in the test class Javadoc. diff --git a/.agents/skills/cassandra-testing/SKILL.md b/.agents/skills/cassandra-testing/SKILL.md new file mode 100644 index 000000000000..48e88238b137 --- /dev/null +++ b/.agents/skills/cassandra-testing/SKILL.md @@ -0,0 +1,60 @@ +--- +name: cassandra-testing +description: Guide for writing tests in Apache Cassandra. Use whenever writing new tests, adding test coverage, or deciding what testing approach to use for a change. Prioritizes property-based and stateful property testing over hand-written example-based tests. +--- + +# Cassandra Testing Strategy + +When writing tests for Apache Cassandra, prefer property-based testing over hand-written example-based tests. Property tests generate thousands of random inputs and catch edge cases that humans miss. + +## Decision Flow + +Ask these questions about what you're testing: + +### 1. Does it involve sequences of operations on mutable state? + +Examples: data structures (trees, indexes, caches), CRUD workflows, schema mutations, topology changes, journal lifecycle (flush/compact/restart). + +**Use the `cassandra-testing-stateful` skill** - it provides `Property.stateful()`, which gives you: +- Random command sequences with weighted selection +- Conditional commands (only delete when non-empty) +- Model-based verification (compare SUT against a simple oracle) +- Full command history in failure reports for debugging +- Lifecycle hooks (destroyState, destroySut, onSuccess) + +### 2. Does it validate a property/invariant? + +Examples: serialization round-trips, algebraic laws (idempotency, commutativity), encoding correctness, range/boundary behavior, type validation. + +**Use the `cassandra-testing-property` skill** - it provides `Property.qt()` with typed generators, which gives you: +- `forAll(gen).check(value -> ...)` for clean property assertions +- `Serializers.testSerde()` for automated serialization testing +- Rich generator library (Accord `Gens`, Cassandra `Generators`, `AbstractTypeGenerators`, `CassandraGenerators`) +- Meta-randomness (`mixedDistribution`) for bias-aware testing + +### 3. Even "simple" tests benefit from property testing + +A test that looks simple often hides complexity. A straightforward `INSERT INTO ... SELECT * FROM` test seems trivial with `(pk int, v int)`, but breaks with `(pk varint, ck frozen, v int, PRIMARY KEY(pk, ck)) WITH CLUSTERING ORDER BY (ck DESC)`. The schema, types, clustering order, and value expressions all interact in ways no human enumerates completely. + +Rather than writing a single example and hoping it covers enough, generate the schema and values randomly: + +``` +// pseudocode +qt().check(rs -> { + schema = randomSchema(rs) // random types, clustering order, indexes, etc. + row = randomRow(rs, schema) // random values matching the schema + insert(schema, row) + result = select(schema, row.pk) + assertEqual(result, row) +}); +``` + +This approach finds edge cases across type combinations, encoding paths, and query planning that hand-written examples miss entirely. + +## Why Property Tests Over Example Tests + +- A single `qt().forAll(gen).check(...)` replaces dozens of hand-written test cases +- Random generation finds edge cases humans don't think of (boundary values, empty inputs, overflow, type interactions) +- Meta-randomness varies the distribution per example, catching bugs that require specific patterns (e.g., mostly-deletes followed by a read) +- Failures are reproducible via seed - just add `withOnlySeed(seed)` to replay +- The generator libraries already exist for most Cassandra types - you just compose them diff --git a/src/java/org/apache/cassandra/utils/DynamicList.java b/src/java/org/apache/cassandra/utils/DynamicList.java index 30f5160f6468..12c1c8b6d24a 100644 --- a/src/java/org/apache/cassandra/utils/DynamicList.java +++ b/src/java/org/apache/cassandra/utils/DynamicList.java @@ -21,6 +21,9 @@ import java.util.HashMap; import java.util.TreeSet; import java.util.concurrent.ThreadLocalRandom; +import java.util.function.IntSupplier; + +import com.google.common.annotations.VisibleForTesting; // simple thread-unsafe skiplist that permits indexing/removal by position, insertion at the end // (though easily extended to insertion at any position, not necessary here) @@ -85,19 +88,26 @@ private Node parent(int parentHeight) } } + private final IntSupplier nextInt; private final int maxHeight; private final Node head; private int size; public DynamicList(int maxExpectedSize) { + this(maxExpectedSize, () -> ThreadLocalRandom.current().nextInt()); + } + + public DynamicList(int maxExpectedSize, IntSupplier nextInt) + { + this.nextInt = nextInt; this.maxHeight = 3 + Math.max(0, (int) Math.ceil(Math.log(maxExpectedSize) / Math.log(2))); head = new Node<>(maxHeight, null); } private int randomLevel() { - return 1 + Integer.bitCount(ThreadLocalRandom.current().nextInt() & ((1 << (maxHeight - 1)) - 1)); + return 1 + Integer.bitCount(nextInt.getAsInt() & ((1 << (maxHeight - 1)) - 1)); } public Node append(E value) @@ -198,10 +208,8 @@ public int size() return size; } - // some quick and dirty tests to confirm the skiplist works as intended - // don't create a separate unit test - tools tree doesn't currently warrant them - - private boolean isWellFormed() + @VisibleForTesting + boolean isWellFormed() { for (int i = 0 ; i < maxHeight ; i++) { diff --git a/src/java/org/apache/cassandra/utils/Hex.java b/src/java/org/apache/cassandra/utils/Hex.java index 19d470363f7f..805ff07afc97 100644 --- a/src/java/org/apache/cassandra/utils/Hex.java +++ b/src/java/org/apache/cassandra/utils/Hex.java @@ -106,7 +106,12 @@ public static long parseLong(String hex, int start, int end) for (int i = start ; i < end ; ++i) { char c = hex.charAt(i); - result |= (long)(c - (c >= 'a' ? 'a' - 10 : '0')) << shift; + if (c >= charToByte.length) + throw new NumberFormatException("Invalid hex character '" + c + "' at index " + i + " in: " + hex.substring(start, end)); + int digit = charToByte[c]; + if (digit < 0) + throw new NumberFormatException("Invalid hex character '" + c + "' at index " + i + " in: " + hex.substring(start, end)); + result |= (long) digit << shift; shift -= 4; } return result; diff --git a/test/unit/org/apache/cassandra/schema/TableIdTest.java b/test/unit/org/apache/cassandra/schema/TableIdTest.java index ae83204ac84d..e0964d5ed0ce 100644 --- a/test/unit/org/apache/cassandra/schema/TableIdTest.java +++ b/test/unit/org/apache/cassandra/schema/TableIdTest.java @@ -21,6 +21,8 @@ import org.assertj.core.api.Assertions; import org.junit.Test; +import accord.utils.Gen; +import accord.utils.Gens; import accord.utils.LazyToString; import accord.utils.ReflectionUtils; @@ -31,6 +33,8 @@ import org.apache.cassandra.utils.Generators; import static accord.utils.Property.qt; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; public class TableIdTest { @@ -86,4 +90,40 @@ public void serializeCompactComparableV2() qt().forAll(Generators.toGen(CassandraGenerators.TABLE_ID_GEN)) .check(input -> Serializers.testSerde(output, TableId.compactComparableSerializer, input)); } - } \ No newline at end of file + + /** + * Round-trip: fromString(tableId.toString()) must recover the original TableId. + * This exercises both the "tid:" path (MAGIC msb, uses Hex.parseLong) and the + * UUID path (non-MAGIC msb, uses UUID.fromString). + */ + @Test + public void fromStringRoundTrip() + { + qt().forAll(Generators.toGen(CassandraGenerators.TABLE_ID_GEN)).check(input -> { + String str = input.toString(); + TableId recovered = TableId.fromString(str); + assertThat(recovered) + .describedAs("fromString(toString()) must recover the original TableId for: %s", str) + .isEqualTo(input); + }); + } + + /** + * fromString must handle uppercase hex characters in the "tid:" format. + * Long.toHexString produces lowercase, but callers may pass uppercase. + */ + @Test + public void fromStringHandlesUppercaseHex() + { + qt().forAll(Gens.longs().all()).check(lsb -> { + String lowercase = "tid:" + Long.toHexString(lsb); + String uppercase = "tid:" + Long.toHexString(lsb).toUpperCase(); + + TableId fromLower = TableId.fromString(lowercase); + TableId fromUpper = TableId.fromString(uppercase); + assertThat(fromUpper) + .describedAs("fromString must handle uppercase hex: %s vs %s", uppercase, lowercase) + .isEqualTo(fromLower); + }); + } + } diff --git a/test/unit/org/apache/cassandra/utils/AccordGenerators.java b/test/unit/org/apache/cassandra/utils/AccordGenerators.java index 3b936ece0b01..0c2e1fc26e7d 100644 --- a/test/unit/org/apache/cassandra/utils/AccordGenerators.java +++ b/test/unit/org/apache/cassandra/utils/AccordGenerators.java @@ -111,6 +111,32 @@ private AccordGenerators() { } + /** + * Generates a random byte array whose size is determined by the given size generator. + * Fills 8 bytes at a time using {@code nextLong()} for efficiency, then fills any remaining bytes individually. + */ + public static Gen byteArray(Gen.IntGen sizeGen) + { + return rs -> { + int size = sizeGen.nextInt(rs); + byte[] bytes = new byte[size]; + int i = 0; + for (; i + 8 <= size; i += 8) + ByteArrayUtil.putLong(bytes, i, rs.nextLong()); + for (; i < size; i++) + bytes[i] = (byte) rs.nextInt(); + return bytes; + }; + } + + /** + * Generates a random byte array of the specified fixed size. + */ + public static Gen byteArrayOfSize(int size) + { + return byteArray(rs -> size); + } + public static boolean maybeUpdatePartitioner(List topologies) { for (var t : topologies) diff --git a/test/unit/org/apache/cassandra/utils/DynamicListPropertyTest.java b/test/unit/org/apache/cassandra/utils/DynamicListPropertyTest.java new file mode 100644 index 000000000000..9fe07f44a7ee --- /dev/null +++ b/test/unit/org/apache/cassandra/utils/DynamicListPropertyTest.java @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.utils; + +import java.util.ArrayList; + +import org.junit.Test; + +import accord.utils.Property; +import accord.utils.RandomSource; + +import static accord.utils.Property.commands; +import static accord.utils.Property.stateful; +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Stateful property-based tests for {@link DynamicList}. + *

+ * Uses an {@link java.util.ArrayList} as the oracle model to verify that + * append, remove, and indexed get operations maintain consistency with the + * skip list's internal structure. + */ +public class DynamicListPropertyTest +{ + private static class State + { + final DynamicList sut; + final ArrayList model = new ArrayList<>(); + final ArrayList> nodes = new ArrayList<>(); + int counter = 0; + + State(RandomSource rs) + { + int maxExpectedSize = rs.nextInt(4, 128); + sut = new DynamicList<>(maxExpectedSize, rs::nextInt); + } + + @Override + public String toString() + { + return "State{size=" + model.size() + ", model=" + model + '}'; + } + } + + private static Property.Command append(RandomSource rs, State state) + { + int value = state.counter++; + return new Property.SimpleCommand<>("Append(" + value + ')', s -> { + DynamicList.Node node = s.sut.append(value); + assertThat(node).as("append should return a non-null node").isNotNull(); + s.model.add(value); + s.nodes.add(node); + assertThat(s.sut.size()).as("size after append").isEqualTo(s.model.size()); + }); + } + + private static Property.Command remove(RandomSource rs, State state) + { + int index = rs.nextInt(state.model.size()); + int value = state.model.get(index); + return new Property.SimpleCommand<>("Remove(index=" + index + ", value=" + value + ')', s -> { + DynamicList.Node node = s.nodes.get(index); + s.sut.remove(node); + s.model.remove(index); + s.nodes.remove(index); + assertThat(s.sut.size()).as("size after remove").isEqualTo(s.model.size()); + }); + } + + private static Property.Command get(RandomSource rs, State state) + { + int index = rs.nextInt(state.model.size()); + return new Property.SimpleCommand<>("Get(" + index + ')', s -> { + Integer actual = s.sut.get(index); + Integer expected = s.model.get(index); + assertThat(actual).as("get(%d)", index).isEqualTo(expected); + }); + } + + private static Property.Command getOutOfBounds(RandomSource rs, State state) + { + return new Property.SimpleCommand<>("GetOutOfBounds(size=" + state.model.size() + ')', s -> { + Integer result = s.sut.get(s.sut.size()); + assertThat(result).as("get(size) should return null").isNull(); + }); + } + + private static Property.Command verifyContents(RandomSource rs, State state) + { + return new Property.SimpleCommand<>("VerifyContents(size=" + state.model.size() + ')', s -> { + assertThat(s.sut.size()).as("size").isEqualTo(s.model.size()); + for (int i = 0; i < s.model.size(); i++) + assertThat(s.sut.get(i)).as("get(%d)", i).isEqualTo(s.model.get(i)); + assertThat(s.sut.get(s.model.size())).as("get(size) should be null").isNull(); + }); + } + + private static Property.Command appendWithMaxSize(RandomSource rs, State state) + { + int value = state.counter++; + int maxSize = state.model.size() + rs.nextInt(0, 3); + return new Property.SimpleCommand<>("AppendWithMaxSize(value=" + value + ", maxSize=" + maxSize + ')', s -> { + DynamicList.Node node = s.sut.append(value, maxSize); + if (s.model.size() >= maxSize) + { + assertThat(node).as("append should return null when at max size").isNull(); + } + else + { + assertThat(node).as("append should return non-null when below max size").isNotNull(); + s.model.add(value); + s.nodes.add(node); + } + assertThat(s.sut.size()).as("size after appendWithMaxSize").isEqualTo(s.model.size()); + }); + } + + private static Property.Command verifyStructure(RandomSource rs, State state) + { + return new Property.SimpleCommand<>("VerifyStructure(size=" + state.model.size() + ')', s -> { + assertThat(s.sut.isWellFormed()).as("isWellFormed").isTrue(); + }); + } + + @Test + public void appendRemoveGetMaintainsModelConsistency() + { + stateful().withExamples(500).withSteps(1000).check(commands(() -> State::new) + .add(3, DynamicListPropertyTest::append) + .addIf(s -> !s.model.isEmpty(), DynamicListPropertyTest::remove) + .addIf(s -> !s.model.isEmpty(), DynamicListPropertyTest::get) + .add(DynamicListPropertyTest::getOutOfBounds) + .add(DynamicListPropertyTest::verifyContents) + .add(DynamicListPropertyTest::verifyStructure) + .add(DynamicListPropertyTest::appendWithMaxSize) + .build()); + } +} diff --git a/test/unit/org/apache/cassandra/utils/HexPropertyTest.java b/test/unit/org/apache/cassandra/utils/HexPropertyTest.java new file mode 100644 index 000000000000..f3620441961b --- /dev/null +++ b/test/unit/org/apache/cassandra/utils/HexPropertyTest.java @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.utils; + +import java.util.Arrays; + +import org.junit.Test; + +import accord.utils.Gen; +import accord.utils.Gens; + +import static accord.utils.Property.qt; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * Property-based tests for {@link Hex}. + */ +public class HexPropertyTest +{ + /** Generate a random byte array of length 0..128. */ + private static final Gen BYTE_ARRAY_GEN = AccordGenerators.byteArray(Gens.ints().between(0, 128)); + + /** + * Generate a valid mixed-case hex string by converting random bytes to hex + * and then randomly uppercasing some of the hex letter characters (a-f). + * This ensures tests exercise both lowercase and uppercase hex input. + */ + private static final Gen HEX_STRING_GEN = rs -> { + byte[] bytes = BYTE_ARRAY_GEN.next(rs); + String lowercase = Hex.bytesToHex(bytes); + char[] chars = lowercase.toCharArray(); + for (int i = 0; i < chars.length; i++) + { + if (chars[i] >= 'a' && chars[i] <= 'f' && rs.nextBoolean()) + chars[i] = (char) (chars[i] - 'a' + 'A'); + } + return new String(chars); + }; + + /** + * Round-trip: hexToBytes(bytesToHex(bytes)) should recover the original bytes. + */ + @Test + public void roundTripBytesToHexToBytes() + { + qt().forAll(BYTE_ARRAY_GEN).check(bytes -> { + String hex = Hex.bytesToHex(bytes); + byte[] recovered = Hex.hexToBytes(hex); + assertThat(recovered).isEqualTo(bytes); + }); + } + + /** + * Inverse round-trip: bytesToHex(hexToBytes(s)) should equal s.toLowerCase() + * for any valid hex string. + */ + @Test + public void roundTripHexToBytesToHex() + { + qt().forAll(HEX_STRING_GEN).check(hex -> { + byte[] bytes = Hex.hexToBytes(hex); + String recovered = Hex.bytesToHex(bytes); + assertThat(recovered).isEqualTo(hex.toLowerCase()); + }); + } + + /** + * bytesToHex with offset/length should produce the same result as + * bytesToHex on the corresponding sub-array. + */ + @Test + public void bytesToHexOffsetLength() + { + qt().check(rs -> { + byte[] bytes = BYTE_ARRAY_GEN.next(rs); + if (bytes.length == 0) + { + assertThat(Hex.bytesToHex(bytes, 0, 0)).isEqualTo(Hex.bytesToHex(new byte[0])); + return; + } + + int offset = rs.nextInt(0, bytes.length); + int length = rs.nextInt(0, bytes.length - offset + 1); + + String fromOffsetLength = Hex.bytesToHex(bytes, offset, length); + String fromSubArray = Hex.bytesToHex(Arrays.copyOfRange(bytes, offset, offset + length)); + assertThat(fromOffsetLength).isEqualTo(fromSubArray); + }); + } + + /** + * parseLong correctness: for any long value, converting to hex with + * Long.toHexString and parsing back with Hex.parseLong should recover + * the original value. + */ + @Test + public void parseLongRoundTrip() + { + qt().forAll(Gens.longs().all()).check(value -> { + String hex = Long.toHexString(value); + long recovered = Hex.parseLong(hex, 0, hex.length()); + assertThat(recovered).isEqualTo(value); + }); + } + + /** + * parseLong agrees with Long.parseUnsignedLong for valid lowercase hex + * strings of 1..16 characters. + */ + @Test + public void parseLongAgreesWithParseUnsignedLong() + { + qt().check(rs -> { + // Generate a valid lowercase hex string of length 1..16 + int len = rs.nextInt(1, 17); + char[] chars = new char[len]; + for (int i = 0; i < len; i++) + { + int nibble = rs.nextInt(16); + chars[i] = "0123456789abcdef".charAt(nibble); + } + String hex = new String(chars); + + // Avoid overflow: Long.parseUnsignedLong throws for values > 16 hex digits, + // but 16 hex digits can overflow if leading nibble > 'f' (impossible) or + // the value exceeds unsigned long max. Both methods should agree. + long fromHex = Hex.parseLong(hex, 0, hex.length()); + long fromJdk = Long.parseUnsignedLong(hex, 16); + assertThat(fromHex).isEqualTo(fromJdk); + }); + } + + /** + * parseLong works correctly with arbitrary start/end offsets within a + * larger string. Uses non-hex characters for padding so that any + * off-by-one error in start/end handling would produce wrong values + * or exceptions rather than silently reading valid hex from the padding. + */ + @Test + public void parseLongWithOffsets() + { + qt().check(rs -> { + long value = rs.nextLong(); + String hex = Long.toHexString(value); + + // Generate random non-hex prefix and suffix + int prefixLen = rs.nextInt(0, 17); + int suffixLen = rs.nextInt(0, 17); + StringBuilder sb = new StringBuilder(prefixLen + hex.length() + suffixLen); + for (int i = 0; i < prefixLen; i++) + sb.append((char) ('g' + rs.nextInt(0, 20))); // g-z are non-hex + sb.append(hex); + for (int i = 0; i < suffixLen; i++) + sb.append((char) ('g' + rs.nextInt(0, 20))); + String padded = sb.toString(); + + int start = prefixLen; + int end = start + hex.length(); + long recovered = Hex.parseLong(padded, start, end); + assertThat(recovered).isEqualTo(value); + }); + } + + /** + * Verifies that {@link Hex#parseLong} correctly handles uppercase hex characters A-F + * in addition to lowercase a-f and digits 0-9. + *

+ * Generates mixed-case hex strings and verifies parseLong agrees with + * {@link Long#parseUnsignedLong(String, int)}. + */ + @Test + public void parseLongHandlesUppercase() + { + qt().check(rs -> { + // Generate a valid mixed-case hex string of length 1..16 + int len = rs.nextInt(1, 17); + char[] chars = new char[len]; + for (int i = 0; i < len; i++) + { + int nibble = rs.nextInt(16); + char c = "0123456789abcdef".charAt(nibble); + // Randomly uppercase hex letter characters + if (c >= 'a' && c <= 'f' && rs.nextBoolean()) + c = (char) (c - 'a' + 'A'); + chars[i] = c; + } + String hex = new String(chars); + + long fromHex = Hex.parseLong(hex, 0, hex.length()); + long fromJdk = Long.parseUnsignedLong(hex, 16); + assertThat(fromHex).isEqualTo(fromJdk); + }); + } + + /** + * Odd-length hex strings must throw NumberFormatException. + */ + @Test + public void hexToBytesRejectsOddLengthStrings() + { + qt().check(rs -> { + // Generate an odd-length string of valid hex characters + int len = rs.nextInt(0, 64) * 2 + 1; // always odd + char[] chars = new char[len]; + for (int i = 0; i < len; i++) + { + int nibble = rs.nextInt(16); + chars[i] = "0123456789abcdef".charAt(nibble); + } + String oddHex = new String(chars); + + assertThatThrownBy(() -> Hex.hexToBytes(oddHex)) + .isInstanceOf(NumberFormatException.class); + }); + } + + /** + * Hex strings containing non-hex ASCII characters must throw NumberFormatException. + */ + @Test + public void hexToBytesRejectsNonHexCharacters() + { + qt().check(rs -> { + // Start with a valid even-length hex string (at least 2 chars) + int pairCount = rs.nextInt(1, 33); + char[] chars = new char[pairCount * 2]; + for (int i = 0; i < chars.length; i++) + { + int nibble = rs.nextInt(16); + chars[i] = "0123456789abcdef".charAt(nibble); + } + + // Replace one character with a non-hex character + int pos = rs.nextInt(0, chars.length); + char bad; + do + { + bad = (char) rs.nextInt(0, 128); + } + while ((bad >= '0' && bad <= '9') || (bad >= 'a' && bad <= 'f') || (bad >= 'A' && bad <= 'F')); + chars[pos] = bad; + + String invalid = new String(chars); + assertThatThrownBy(() -> Hex.hexToBytes(invalid)) + .isInstanceOf(NumberFormatException.class); + }); + } + + /** + * Hex strings containing Unicode characters >= 256 must throw an exception. + * The {@code charToByte} lookup table is only 256 entries, so characters outside + * this range cause an {@link ArrayIndexOutOfBoundsException} rather than the + * {@link NumberFormatException} thrown for non-hex ASCII characters. + */ + @Test + public void hexToBytesRejectsUnicodeCharacters() + { + qt().check(rs -> { + // Start with a valid even-length hex string (at least 2 chars) + int pairCount = rs.nextInt(1, 33); + char[] chars = new char[pairCount * 2]; + for (int i = 0; i < chars.length; i++) + { + int nibble = rs.nextInt(16); + chars[i] = "0123456789abcdef".charAt(nibble); + } + + // Replace one character with a Unicode character >= 256 + int pos = rs.nextInt(0, chars.length); + chars[pos] = (char) rs.nextInt(256, 65536); + + String invalid = new String(chars); + assertThatThrownBy(() -> Hex.hexToBytes(invalid)) + .isInstanceOf(ArrayIndexOutOfBoundsException.class); + }); + } + + /** + * parseLong must throw NumberFormatException for strings containing non-hex characters. + * Characters outside [0-9a-fA-F] are invalid hex digits. + */ + @Test + public void parseLongRejectsNonHexCharacters() + { + qt().check(rs -> { + // Generate a valid hex string of length 1..16 + int len = rs.nextInt(1, 17); + char[] chars = new char[len]; + for (int i = 0; i < len; i++) + { + int nibble = rs.nextInt(16); + chars[i] = "0123456789abcdef".charAt(nibble); + } + + // Replace one character with a non-hex character + int pos = rs.nextInt(0, len); + char bad; + do + { + bad = (char) rs.nextInt(0, 128); + } + while ((bad >= '0' && bad <= '9') || (bad >= 'a' && bad <= 'f') || (bad >= 'A' && bad <= 'F')); + chars[pos] = bad; + + String invalid = new String(chars); + assertThatThrownBy(() -> Hex.parseLong(invalid, 0, invalid.length())) + .isInstanceOf(NumberFormatException.class); + }); + } + + /** + * bytesToHex always produces a string of even length equal to 2 * input length, + * containing only lowercase hex characters. + */ + @Test + public void bytesToHexOutputFormat() + { + qt().forAll(BYTE_ARRAY_GEN).check(bytes -> { + String hex = Hex.bytesToHex(bytes); + assertThat(hex).hasSize(bytes.length * 2); + assertThat(hex).matches("[0-9a-f]*"); + }); + } +} diff --git a/test/unit/org/apache/cassandra/utils/LongTimSortPropertyTest.java b/test/unit/org/apache/cassandra/utils/LongTimSortPropertyTest.java new file mode 100644 index 000000000000..2f6772a75967 --- /dev/null +++ b/test/unit/org/apache/cassandra/utils/LongTimSortPropertyTest.java @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.utils; + +import java.util.Arrays; + +import org.junit.Test; + +import accord.utils.Gen; +import accord.utils.Gens; +import accord.utils.RandomSource; +import org.apache.cassandra.utils.LongTimSort.LongComparator; + +import static accord.utils.Property.qt; +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Property-based tests for {@link LongTimSort}. + *

+ * Verifies sorting correctness, permutation preservation, stability, range sorting, + * and consistency with {@link Arrays#sort(long[])} across a wide variety of randomly + * generated long arrays. + */ +public class LongTimSortPropertyTest +{ + /** + * Meta-random size generator: each test example picks a different distribution + * strategy for array sizes in [0, 501), giving coverage of empty, tiny, medium, + * and large arrays with varying frequency across examples. + */ + private static final Gen SIZE_DISTRO = Gens.mixedDistribution(0, 501); + + /** + * Generates a random long array with meta-random size and a mix of value strategies: + * sometimes all-equal, sometimes small-range, sometimes full-range longs. + */ + private static long[] genLongArray(RandomSource rs, Gen.IntGen sizeGen) + { + int size = sizeGen.nextInt(rs); + long[] a = new long[size]; + // Pick a value strategy for this array + int strategy = rs.nextInt(0, 7); + switch (strategy) + { + case 0: // all same value + long v = rs.nextLong(); + Arrays.fill(a, v); + break; + case 1: // small range [-10, 10] + for (int i = 0; i < size; i++) + a[i] = rs.nextInt(-10, 11); + break; + case 2: // medium range [-1000, 1000] + for (int i = 0; i < size; i++) + a[i] = rs.nextInt(-1000, 1001); + break; + case 3: // pre-sorted ascending + for (int i = 0; i < size; i++) + a[i] = i; + break; + case 4: // pre-sorted descending + for (int i = 0; i < size; i++) + a[i] = size - i; + break; + case 5: // nearly sorted (sorted then perturb a few elements) + for (int i = 0; i < size; i++) + a[i] = i; + int perturbations = Math.max(1, size / 20); + for (int p = 0; p < perturbations && size > 1; p++) + { + int j = rs.nextInt(0, size); + int k = rs.nextInt(0, size); + long tmp = a[j]; + a[j] = a[k]; + a[k] = tmp; + } + break; + case 6: // full range longs + for (int i = 0; i < size; i++) + a[i] = rs.nextLong(); + break; + default: + throw new IllegalStateException("Unknown strategy: " + strategy); + } + return a; + } + + // ---- Property 1: Sorted output ---- + + @Test + public void sortedOutput() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + long[] a = genLongArray(rs, sizeGen); + LongTimSort.sort(a, Long::compare); + for (int i = 0; i < a.length - 1; i++) + assertThat(a[i]).describedAs("a[%d] <= a[%d]", i, i + 1).isLessThanOrEqualTo(a[i + 1]); + }); + } + + // ---- Property 2: Permutation (multiset preservation) ---- + + @Test + public void permutation() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + long[] a = genLongArray(rs, sizeGen); + long[] original = a.clone(); + LongTimSort.sort(a, Long::compare); + // Both sorted arrays must be identical element-by-element + Arrays.sort(original); + assertThat(a).describedAs("sorted array must be a permutation of the original").isEqualTo(original); + }); + } + + // ---- Property 3: Idempotency ---- + + @Test + public void idempotency() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + long[] a = genLongArray(rs, sizeGen); + LongTimSort.sort(a, Long::compare); + long[] sortedOnce = a.clone(); + LongTimSort.sort(a, Long::compare); + assertThat(a).describedAs("sorting an already-sorted array must produce the same result").isEqualTo(sortedOnce); + }); + } + + // ---- Property 4: Stability ---- + + @Test + public void stability() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + int size = sizeGen.nextInt(rs); + // Encode (value, originalIndex) into a single long: + // high 32 bits = value (from small range to force duplicates), low 32 bits = original index + long[] a = new long[size]; + for (int i = 0; i < size; i++) + { + int value = rs.nextInt(-50, 51); + a[i] = ((long) value << 32) | (i & 0xFFFFFFFFL); + } + + // Compare by high 32 bits only (the value) + LongComparator cmp = (o1, o2) -> Integer.compare((int) (o1 >> 32), (int) (o2 >> 32)); + LongTimSort.sort(a, cmp); + + // After sort, elements with equal high bits must have increasing low bits (original indices) + for (int i = 0; i < a.length - 1; i++) + { + int valI = (int) (a[i] >> 32); + int valNext = (int) (a[i + 1] >> 32); + assertThat(valI).describedAs("sorted by value at position %d", i).isLessThanOrEqualTo(valNext); + if (valI == valNext) + { + long idxI = a[i] & 0xFFFFFFFFL; + long idxNext = a[i + 1] & 0xFFFFFFFFL; + assertThat(idxI) + .describedAs("stability: equal elements at positions %d and %d must preserve relative order", i, i + 1) + .isLessThan(idxNext); + } + } + }); + } + + // ---- Property 5: Range sort ---- + + @Test + public void rangeSort() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + long[] a = genLongArray(rs, sizeGen); + if (a.length == 0) return; + + int lo = rs.nextInt(0, a.length); + int hi = rs.nextInt(lo, a.length + 1); + long[] original = a.clone(); + + LongTimSort.sort(a, lo, hi, Long::compare); + + // Elements outside [lo, hi) must be unchanged + for (int i = 0; i < lo; i++) + assertThat(a[i]).describedAs("element before range at index %d must be unchanged", i).isEqualTo(original[i]); + for (int i = hi; i < a.length; i++) + assertThat(a[i]).describedAs("element after range at index %d must be unchanged", i).isEqualTo(original[i]); + + // Elements within [lo, hi) must be sorted + for (int i = lo; i < hi - 1; i++) + assertThat(a[i]).describedAs("a[%d] <= a[%d] in sorted range", i, i + 1).isLessThanOrEqualTo(a[i + 1]); + + // Elements within [lo, hi) must be a permutation of original[lo..hi) + long[] sortedRange = Arrays.copyOfRange(a, lo, hi); + long[] originalRange = Arrays.copyOfRange(original, lo, hi); + Arrays.sort(originalRange); + assertThat(sortedRange).describedAs("range must be a permutation of original range").isEqualTo(originalRange); + }); + } + + // ---- Property 6: Consistency with Arrays.sort ---- + + @Test + public void consistencyWithArraysSort() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + long[] a = genLongArray(rs, sizeGen); + long[] expected = a.clone(); + Arrays.sort(expected); + LongTimSort.sort(a, Long::compare); + assertThat(a).describedAs("LongTimSort must produce the same result as Arrays.sort").isEqualTo(expected); + }); + } + + // ---- Property 7: Reverse comparator ---- + + @Test + public void reverseComparator() + { + qt().check(rs -> { + Gen.IntGen sizeGen = SIZE_DISTRO.next(rs); + long[] a = genLongArray(rs, sizeGen); + LongTimSort.sort(a, (x, y) -> Long.compare(y, x)); + for (int i = 0; i < a.length - 1; i++) + assertThat(a[i]).describedAs("a[%d] >= a[%d] in descending order", i, i + 1).isGreaterThanOrEqualTo(a[i + 1]); + }); + } +} diff --git a/test/unit/org/apache/cassandra/utils/MurmurHashPropertyTest.java b/test/unit/org/apache/cassandra/utils/MurmurHashPropertyTest.java new file mode 100644 index 000000000000..ef6db543e814 --- /dev/null +++ b/test/unit/org/apache/cassandra/utils/MurmurHashPropertyTest.java @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.utils; + +import java.nio.ByteBuffer; +import java.nio.charset.StandardCharsets; + +import org.junit.Test; + +import accord.utils.Gen; +import accord.utils.Gens; + +import static accord.utils.Property.qt; +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Property-based tests for {@link MurmurHash}. + *

+ * Tests verify byte[]/ByteBuffer equivalence, inverse round-trip + * correctness, and regression values for the various MurmurHash functions. + */ +public class MurmurHashPropertyTest +{ + /** Generator for random byte arrays of length 0..256. */ + private static final Gen BYTE_ARRAY_GEN = AccordGenerators.byteArray(Gens.ints().between(0, 256)); + + /** Generator for random byte arrays whose length is exactly 16 (for hash3 inverse). */ + private static final Gen ALIGNED_16_BYTE_GEN = AccordGenerators.byteArrayOfSize(16); + + // ---- ByteBuffer / byte[] equivalence ---- + + @Test + public void hash2_64ByteBufferEquivalence() + { + qt().forAll(BYTE_ARRAY_GEN, Gens.longs().all()).check((data, seed) -> { + long fromArray = MurmurHash.hash2_64(data, 0, data.length, seed); + long fromHeap = MurmurHash.hash2_64(ByteBuffer.wrap(data), 0, data.length, seed); + assertThat(fromHeap) + .describedAs("hash2_64 heap ByteBuffer and byte[] must produce the same result") + .isEqualTo(fromArray); + + // Also test with a direct ByteBuffer + ByteBuffer direct = ByteBuffer.allocateDirect(data.length); + direct.put(data); + direct.flip(); + long fromDirect = MurmurHash.hash2_64(direct, 0, data.length, seed); + assertThat(fromDirect) + .describedAs("hash2_64 direct ByteBuffer and byte[] must produce the same result") + .isEqualTo(fromArray); + }); + } + + @Test + public void hash3_x64_128ByteBufferEquivalence() + { + qt().forAll(BYTE_ARRAY_GEN, Gens.longs().all()).check((data, seed) -> { + long[] resultArray = new long[2]; + long[] resultHeap = new long[2]; + long[] resultDirect = new long[2]; + MurmurHash.hash3_x64_128(data, 0, data.length, seed, resultArray); + MurmurHash.hash3_x64_128(ByteBuffer.wrap(data), 0, data.length, seed, resultHeap); + assertThat(resultHeap) + .describedAs("hash3_x64_128 heap ByteBuffer and byte[] must produce the same result") + .isEqualTo(resultArray); + + ByteBuffer direct = ByteBuffer.allocateDirect(data.length); + direct.put(data); + direct.flip(); + MurmurHash.hash3_x64_128(direct, 0, data.length, seed, resultDirect); + assertThat(resultDirect) + .describedAs("hash3_x64_128 direct ByteBuffer and byte[] must produce the same result") + .isEqualTo(resultArray); + }); + } + + // ---- ByteBuffer with non-zero offset equivalence ---- + + @Test + public void hash32OffsetEquivalence() + { + qt().check(rs -> { + int padding = rs.nextInt(1, 33); + byte[] data = BYTE_ARRAY_GEN.next(rs); + int seed = rs.nextInt(); + + // Create a ByteBuffer with leading padding + byte[] padded = new byte[padding + data.length]; + System.arraycopy(data, 0, padded, padding, data.length); + ByteBuffer paddedBuf = ByteBuffer.wrap(padded); + + int fromDirect = MurmurHash.hash32(ByteBuffer.wrap(data), 0, data.length, seed); + int fromOffset = MurmurHash.hash32(paddedBuf, padding, data.length, seed); + assertThat(fromOffset) + .describedAs("hash32 with offset must match direct result") + .isEqualTo(fromDirect); + }); + } + + @Test + public void hash2_64OffsetEquivalence() + { + qt().check(rs -> { + int padding = rs.nextInt(1, 33); + byte[] data = BYTE_ARRAY_GEN.next(rs); + long seed = rs.nextLong(); + + // Create a ByteBuffer with leading padding + byte[] padded = new byte[padding + data.length]; + System.arraycopy(data, 0, padded, padding, data.length); + ByteBuffer buf = ByteBuffer.wrap(padded); + + long fromArray = MurmurHash.hash2_64(data, 0, data.length, seed); + long fromBuffer = MurmurHash.hash2_64(buf, padding, data.length, seed); + assertThat(fromBuffer) + .describedAs("hash2_64 with offset must match byte[] result") + .isEqualTo(fromArray); + }); + } + + @Test + public void hash3_x64_128OffsetEquivalence() + { + qt().check(rs -> { + int padding = rs.nextInt(1, 33); + byte[] data = BYTE_ARRAY_GEN.next(rs); + long seed = rs.nextLong(); + + byte[] padded = new byte[padding + data.length]; + System.arraycopy(data, 0, padded, padding, data.length); + ByteBuffer buf = ByteBuffer.wrap(padded); + + long[] resultArray = new long[2]; + long[] resultBuffer = new long[2]; + MurmurHash.hash3_x64_128(data, 0, data.length, seed, resultArray); + MurmurHash.hash3_x64_128(buf, padding, data.length, seed, resultBuffer); + assertThat(resultBuffer) + .describedAs("hash3_x64_128 with offset must match byte[] result") + .isEqualTo(resultArray); + }); + } + + // ---- Inverse round-trips ---- + + @Test + public void fmixInverseRoundTrip() + { + qt().forAll(Gens.longs().all()).check(k -> { + long hashed = MurmurHash.fmix(k); + long recovered = MurmurHash.invFmix(hashed); + assertThat(recovered) + .describedAs("invFmix(fmix(k)) must equal k") + .isEqualTo(k); + }); + } + + @Test + public void invFmixForwardRoundTrip() + { + qt().forAll(Gens.longs().all()).check(k -> { + long unhashed = MurmurHash.invFmix(k); + long recovered = MurmurHash.fmix(unhashed); + assertThat(recovered) + .describedAs("fmix(invFmix(k)) must equal k") + .isEqualTo(k); + }); + } + + @Test + public void hash3InverseRoundTripSeedZero() + { + qt().forAll(ALIGNED_16_BYTE_GEN).check(input -> { + long[] hashResult = new long[2]; + MurmurHash.hash3_x64_128(input, 0, 16, 0, hashResult); + long[] recovered = MurmurHash.inv_hash3_x64_128(hashResult); + + // inv_hash3_x64_128 applies Long.reverseBytes, so the returned longs + // are in big-endian byte order. Extract bytes accordingly. + byte[] reconstructed = new byte[16]; + for (int i = 0; i < 8; i++) + { + reconstructed[i] = (byte) (recovered[0] >>> ((7 - i) * 8)); + reconstructed[i + 8] = (byte) (recovered[1] >>> ((7 - i) * 8)); + } + assertThat(reconstructed) + .describedAs("inv_hash3_x64_128 must recover the original 16-byte input with seed=0") + .isEqualTo(input); + }); + } + + // ---- invRShiftXor correctness ---- + + @Test + public void invRShiftXorCorrectness() + { + qt().forAll(Gens.longs().all(), Gens.ints().between(1, 63)).check((value, shift) -> { + long shifted = value ^ (value >>> shift); + long recovered = MurmurHash.invRShiftXor(shifted, shift); + assertThat(recovered) + .describedAs("invRShiftXor must recover the original value for shift=%d", shift) + .isEqualTo(value); + }); + } + + // ---- invTailReverse ---- + // Note: MurmurHash.invTailReverse is a public method with no test coverage. + // Testing it in isolation is non-trivial because: + // 1. It has specific byte-order conventions (Long.reverseBytes at entry, Longs big-endian + // at exit) that are only meaningful in the context of the full hash3 inverse pipeline. + // 2. It has a latent bug: BitSet.toByteArray() can return a shorter array when high bits + // are zero, causing ArrayIndexOutOfBoundsException for certain inputs. + // A proper test would require implementing the full non-16-byte hash3 inverse pipeline + // (which doesn't exist in the codebase). The 16-byte case is covered by + // hash3InverseRoundTripSeedZero, which uses inv_hash3_x64_128 (no tail processing needed). + + // ---- Known test vectors (golden values) ---- + + @Test + public void knownTestVectors() + { + byte[] empty = new byte[0]; + byte[] singleZero = new byte[]{ 0x00 }; + byte[] hello = "Hello".getBytes(StandardCharsets.US_ASCII); + + // Regression tests: these values were generated by this implementation and pinned. + // They verify the hash function has not changed, not that it matches an external spec. + // hash32 + assertThat(MurmurHash.hash32(ByteBuffer.wrap(empty), 0, 0, 0)) + .describedAs("hash32(empty, seed=0)") + .isEqualTo(0); + assertThat(MurmurHash.hash32(ByteBuffer.wrap(singleZero), 0, 1, 0)) + .describedAs("hash32({0x00}, seed=0)") + .isEqualTo(-380735811); + assertThat(MurmurHash.hash32(ByteBuffer.wrap(hello), 0, hello.length, 0)) + .describedAs("hash32(\"Hello\", seed=0)") + .isEqualTo(1826530862); + assertThat(MurmurHash.hash32(ByteBuffer.wrap(hello), 0, hello.length, 42)) + .describedAs("hash32(\"Hello\", seed=42)") + .isEqualTo(120081362); + + // hash2_64 + assertThat(MurmurHash.hash2_64(empty, 0, 0, 0L)) + .describedAs("hash2_64(empty, seed=0)") + .isEqualTo(0L); + assertThat(MurmurHash.hash2_64(singleZero, 0, 1, 0L)) + .describedAs("hash2_64({0x00}, seed=0)") + .isEqualTo(6351753276682545529L); + assertThat(MurmurHash.hash2_64(hello, 0, hello.length, 0L)) + .describedAs("hash2_64(\"Hello\", seed=0)") + .isEqualTo(6940510666185404851L); + assertThat(MurmurHash.hash2_64(hello, 0, hello.length, 42L)) + .describedAs("hash2_64(\"Hello\", seed=42)") + .isEqualTo(-5190831759633619065L); + + // hash3_x64_128 + long[] result = new long[2]; + + MurmurHash.hash3_x64_128(empty, 0, 0, 0L, result); + assertThat(result).describedAs("hash3_x64_128(empty, seed=0)") + .isEqualTo(new long[]{ 0L, 0L }); + + MurmurHash.hash3_x64_128(singleZero, 0, 1, 0L, result); + assertThat(result).describedAs("hash3_x64_128({0x00}, seed=0)") + .isEqualTo(new long[]{ 5048724184180415669L, 5864299874987029891L }); + + MurmurHash.hash3_x64_128(hello, 0, hello.length, 0L, result); + assertThat(result).describedAs("hash3_x64_128(\"Hello\", seed=0)") + .isEqualTo(new long[]{ 3871253994707141660L, -6917270852172884668L }); + + MurmurHash.hash3_x64_128(hello, 0, hello.length, 42L, result); + assertThat(result).describedAs("hash3_x64_128(\"Hello\", seed=42)") + .isEqualTo(new long[]{ 2550721319707356219L, -6862742243595569438L }); + } + + // ---- Empty input handling ---- + + @Test + public void emptyInputHash2_64() + { + qt().forAll(Gens.longs().all()).check(seed -> { + long h1 = MurmurHash.hash2_64(new byte[0], 0, 0, seed); + long h2 = MurmurHash.hash2_64(ByteBuffer.allocate(0), 0, 0, seed); + assertThat(h1) + .describedAs("Empty input hash2_64 must be equivalent between byte[] and ByteBuffer") + .isEqualTo(h2); + }); + } + + @Test + public void emptyInputHash3_x64_128() + { + qt().forAll(Gens.longs().all()).check(seed -> { + long[] resultArray = new long[2]; + long[] resultBuffer = new long[2]; + MurmurHash.hash3_x64_128(new byte[0], 0, 0, seed, resultArray); + MurmurHash.hash3_x64_128(ByteBuffer.allocate(0), 0, 0, seed, resultBuffer); + assertThat(resultBuffer) + .describedAs("Empty input hash3_x64_128 must be equivalent between byte[] and ByteBuffer") + .isEqualTo(resultArray); + }); + } +}