Skip to content

feat(clickhouse): add UUID, Decimal, Array/Tuple, UInt8/Int8, raw ORDER BY, rawColumn passthrough#10

Merged
lohanidamodar merged 6 commits into
mainfrom
feat/clickhouse-schema-extras-2
May 20, 2026
Merged

feat(clickhouse): add UUID, Decimal, Array/Tuple, UInt8/Int8, raw ORDER BY, rawColumn passthrough#10
lohanidamodar merged 6 commits into
mainfrom
feat/clickhouse-schema-extras-2

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #8 — adds the remaining ClickHouse schema features commonly needed in production OLAP workloads, plus a small compiler fix. Base-level features (uuid(), decimal(), tinyInteger(), smallInteger(), defaultRaw()) also map cleanly across MySQL, PostgreSQL, SQLite, and MongoDB.

What's new

UInt8 / Int8 via tinyInteger() and UInt16 / Int16 via smallInteger()

Small integer columns are a natural fit for bounded enumerations, percentage values, and other fields whose value range fits well below 32 bits. Storing them as UInt8 saves 75% of the disk and memory footprint compared to the default UInt32 produced by integer()->unsigned(). ClickHouse emits UInt8/Int8 and UInt16/Int16; MySQL maps to TINYINT/SMALLINT; PostgreSQL to SMALLINT (no TINYINT); SQLite to INTEGER.

$schema->table('events')
    ->bigInteger('id')->primary()
    ->tinyInteger('scroll_depth')->unsigned()
    ->smallInteger('year_offset')
    ->create();

Array(T) and Tuple(...) column types

Array(T) is the canonical ClickHouse type for multi-valued attributes — tags, labels, key/value pairs flattened into parallel arrays — and is the standard way to model nested records in the MergeTree family. Tuple(...) covers fixed-arity composites like geo points and key/value pairs.

use Utopia\Query\Schema\ColumnType;

$schema->table('events')
    ->bigInteger('id')->primary()
    ->array('meta.key', ColumnType::String)
    ->array('meta.value', ColumnType::String)
    ->array('user_ids', ColumnType::BigInteger)->unsigned()
    ->tuple('coords', [ColumnType::Float, ColumnType::Float])
    ->create();

Element types run back through the standard column-type compiler so the parent column's unsigned() and precision flags carry through to the inner type. Nullable(...) wraps the whole Array/Tuple; LowCardinality(...) is rejected on these columns because ClickHouse only permits it on scalar types. ClickHouse-only — calling ->array() or ->tuple() on a different dialect's builder fails at the type level.

decimal(precision, scale)

Fixed-point numeric column type for monetary or precision-sensitive values where binary-floating-point error is unacceptable. ClickHouse emits Decimal(P, S); MySQL/PostgreSQL emit DECIMAL(P, S); SQLite emits NUMERIC(P, S); MongoDB maps to the decimal BSON type. Combines with nullable() exactly as scalar columns do.

$schema->table('orders')
    ->bigInteger('id')->primary()
    ->decimal('amount', precision: 18, scale: 3)
    ->decimal('rate', precision: 5, scale: 4)->nullable()
    ->create();

UUID column type with defaultRaw()

UUIDs are first-class fixed-width identifier types in ClickHouse and PostgreSQL and a 36-character string elsewhere; production schemas commonly use them as primary identifiers with server-generated defaults. Column::defaultRaw(string) emits the expression verbatim after DEFAULT — distinct from default(), which quotes string literals — so callers can attach generateUUIDv4(), gen_random_uuid(), UUID(), now(), CURRENT_TIMESTAMP, and similar dialect-specific server-generated defaults.

$schema->table('events')
    ->uuid('event_id')->defaultRaw('generateUUIDv4()')->primary()
    ->datetime('ts', 3)
    ->create();

uuid() compiles to UUID on ClickHouse and PostgreSQL, CHAR(36) on MySQL, TEXT on SQLite, and the string BSON type on MongoDB. defaultRaw() is on the base Column, so it works on every dialect; it takes precedence over default() when both are set, and rejects empty strings and semicolons.

Raw expressions in ORDER BY

MergeTree ORDER BY clauses routinely include scalar function calls — toDate(ts), cityHash64(...), intHash32(user_id) — to control sparse-index cardinality. orderBy(array) restricts each entry to a plain identifier; orderByRaw(string) accepts the full parenthesised tuple verbatim, mirroring the existing partitionBy(string) convention.

$schema->table('events')
    ->string('tenant')
    ->bigInteger('id')
    ->datetime('ts')
    ->orderByRaw('(`tenant`, toDate(`ts`), `id`)')
    ->create();

Takes precedence over orderBy() when both are set; rejects empty strings and semicolons. ClickHouse-only.

rawColumn() passthrough fix on ClickHouse

Table::rawColumn(string $definition) is the documented escape hatch for column types the typed builder does not yet model. The base Schema::compileCreate() already iterates $table->rawColumnDefs, but the Schema\ClickHouse::compileCreate() override loop did not — so raw fragments registered through the same fluent builder silently disappeared from the generated DDL on ClickHouse only. The fix mirrors the loop in the ClickHouse override (one for-loop).

Out of scope (planned follow-up)

  • Bulk insert formats on Builder\ClickHouse (FORMAT JSONEachRow, RowBinary, TabSeparated, Parquet) — broader surface that touches the builder rather than the schema compiler; deserves its own PR.

Tests

38 new assertions across:

  • ClickHouseTestuuid() with and without defaultRaw(), nullable wrapping, defaultRaw() precedence and validation, tinyInteger()/smallInteger() (signed and unsigned), decimal() with nullable(), array(T) with String/UInt64/nullable wrapping, LowCardinality rejection on Array, tuple() with empty-list validation, orderByRaw() with mixed function calls, orderByRaw() precedence and validation, rawColumn() passthrough through compileCreate().
  • MySQLTest, PostgreSQLTest, SQLiteTesttinyInteger/smallInteger/decimal/uuid cross-dialect mappings; defaultRaw() rendered correctly alongside NOT NULL/PRIMARY KEY; decimal() precision/scale validation.
  • MongoDBTestdecimal/tinyInteger/uuid BSON type mappings.

All gates green: composer test, composer lint, composer check (PHPStan level max).

`rawColumn()` is the documented escape hatch for emitting dialect-specific
column types the typed builder does not yet model. The base
`Schema::compileCreate()` already iterates `$table->rawColumnDefs`, but the
ClickHouse override loop did not — so raw fragments registered through the
same fluent builder silently disappeared from the generated DDL on
ClickHouse only. Mirror the loop in `Schema\ClickHouse::compileCreate()`.
…w(), plus ClickHouse Array/Tuple and raw ORDER BY

Adds the remaining production-OLAP-shaped schema features that callers
had to drop to `rawColumn()` for after the 0.3.x bump:

- `Table::uuid()` — UUID column type, native on ClickHouse (`UUID`) and
  PostgreSQL (`UUID`); `CHAR(36)` on MySQL; `TEXT` on SQLite; `string`
  BSON type on MongoDB. Server-generated UUIDs are common as primary
  identifiers and need a dialect-specific default expression rather
  than an application-supplied value.

- `Column::defaultRaw(string)` — raw default expression emitted
  verbatim after `DEFAULT`. Lets callers attach `generateUUIDv4()`,
  `gen_random_uuid()`, `UUID()`, `now()`, `CURRENT_TIMESTAMP`, etc.
  without the quoting `default()` applies to scalar values. Takes
  precedence over `default()` when both are set; rejects empty strings
  and semicolons.

- `Table::tinyInteger()` and `Table::smallInteger()` — small integer
  column types. On ClickHouse they map to `UInt8`/`Int8` and
  `UInt16`/`Int16` (75% smaller than the default `UInt32` produced by
  `integer()->unsigned()`), to native `TINYINT`/`SMALLINT` on MySQL,
  to `SMALLINT` on PostgreSQL (which has no `TINYINT`), and to
  `INTEGER` on SQLite. Useful for bounded enumerations, percentage
  values, and other fields that fit well under 32 bits.

- `Table::decimal(name, precision, scale)` — fixed-point numeric
  column for monetary and precision-sensitive values where
  binary-floating-point error is unacceptable. ClickHouse emits
  `Decimal(P, S)`; MySQL/PostgreSQL emit `DECIMAL(P, S)`; SQLite
  emits `NUMERIC(P, S)`; MongoDB maps to the `decimal` BSON type.
  Rejects negative scale and scale greater than precision.

- `Table\ClickHouse::array(name, ColumnType $element)` and
  `Table\ClickHouse::tuple(name, list<ColumnType>)` — `Array(T)` and
  `Tuple(...)` nested column types. Core ClickHouse types for
  multi-valued attributes (tags, labels, parallel-array nested
  records) and fixed-arity composites (geo points, key/value pairs).
  Element types run back through the standard column-type compiler so
  `unsigned()` and `precision`/`scale` flags carry into the inner
  type. `Nullable(...)` wraps the whole `Array`/`Tuple`;
  `LowCardinality(...)` is rejected on these columns to match
  ClickHouse's documented constraints.

- `Table\ClickHouse::orderByRaw(string)` — raw `ORDER BY` expression
  emitted verbatim. MergeTree `ORDER BY` clauses routinely include
  scalar function calls (`toDate(ts)`, `cityHash64(...)`,
  `intHash32(user_id)`) to control sparse-index cardinality; the
  existing identifier-only `orderBy(array)` blocks this common shape.
  Mirrors the `partitionBy(string)` convention. Takes precedence over
  `orderBy()` when both are set; rejects empty strings and semicolons.

README updated under "Creating Tables" (new types and modifiers) and
"ClickHouse Schema" (per-feature subsections with generated DDL).

`Column::$scale` is added alongside the existing `$precision`/`$length`
constructor args, and dialect `Table::newColumn()` overrides forward
it through.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

📊 Coverage

Metric PR Baseline Δ
Lines 91.80% (7407/8069) 91.95% -0.15%
Methods 84.42% (1100/1303) 84.57% -0.15%
Classes 65.85% (135/205) 66.34% -0.49%

Full per-file breakdown in the job summary.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR adds a set of ClickHouse-oriented schema features — UInt8/Int8/UInt16/Int16 via tinyInteger/smallInteger, Decimal, UUID, Array(T), Tuple(...), raw ORDER BY, defaultRaw(), and a rawColumn passthrough fix — along with cross-dialect mappings for the scalar types on MySQL, PostgreSQL, SQLite, and MongoDB.

  • Array and Tuple column types are ClickHouse-only builder methods; both Nullable(Array(...)) and Nullable(Tuple(...)) now correctly throw UnsupportedException, and LowCardinality is guarded on both.
  • The rawColumn passthrough bug in the ClickHouse compileCreate override is fixed by mirroring the base-class loop; defaultRaw() is added to the base Column with empty/semicolon validation and correct precedence over default().
  • 38 new test assertions cover the new types, validation paths, precedence rules, and cross-dialect mappings.

Confidence Score: 5/5

Safe to merge; the new nullable and LowCardinality guards are in place, the rawColumn bug is fixed, and dialect mappings are exhaustive and verified by tests.

All core compilation paths are covered by the new test suite. The only gap is a missing test for the Tuple nullable rejection guard, which was added in direct response to prior review feedback but not paired with a test the way the Array guard was.

src/Query/Schema/ClickHouse.php — the Tuple nullable guard fires after inner-type compilation (unlike the parallel Array guard), and the corresponding test is absent.

Important Files Changed

Filename Overview
src/Query/Schema/ClickHouse.php Adds Array/Tuple type compilation, UUID, tinyInteger/smallInteger, Decimal, rawColumn passthrough, and orderByRaw; nullable/LowCardinality guards added for both Array and Tuple. Minor: Tuple nullable guard fires after inner-type compilation unlike the parallel Array guard.
src/Query/Schema/Column.php Adds scale constructor param, defaultRaw property/method with semicolon and empty-string validation, and forwarding helpers for tinyInteger/smallInteger/decimal/uuid.
src/Query/Schema/ColumnType.php Adds TinyInteger, SmallInteger, Decimal, Uuid, Array, Tuple enum cases; clean and exhaustive.
src/Query/Schema/Table.php Adds tinyInteger, smallInteger, decimal (with precision/scale validation), and uuid factory methods to the base Table; scale parameter threaded through newColumn signature across all subclasses.
src/Query/Schema/Table/ClickHouse.php Adds array/tuple builder methods and orderByRaw with validation; orderByRaw takes precedence over orderBy at compile time.
src/Query/Schema/Column/ClickHouse.php Adds arrayElementType and tupleElementTypes properties plus asArray/asTuple/isArray/isTuple helpers; asTuple validates non-empty element list.
tests/Query/Schema/ClickHouseTest.php Adds 14 new test methods covering uuid/defaultRaw/tinyInteger/smallInteger/decimal/array/tuple/orderByRaw/rawColumn; Array nullable rejection is tested but the parallel Tuple nullable rejection has no corresponding test.
src/Query/Schema/MySQL.php Adds TinyInteger to TINYINT, SmallInteger/SmallSerial to SMALLINT, Decimal, Uuid to CHAR(36), and Array/Tuple unsupported exception; SmallSerial mapping de-duplicated.
src/Query/Schema/PostgreSQL.php Adds TinyInteger/SmallInteger to SMALLINT, Decimal, Uuid to UUID, Array/Tuple unsupported exception, and defaultRaw handling in compileColumn.
src/Query/Schema/SQLite.php Adds TinyInteger/SmallInteger to INTEGER, Decimal to NUMERIC, Uuid to TEXT, Array/Tuple unsupported exception, and defaultRaw handling.
src/Query/Schema/MongoDB.php Adds TinyInteger/SmallInteger to int, Decimal to decimal, Uuid to string, Array/Tuple to array BSON type mappings; exhaustive match maintained.
src/Query/Schema/Schema.php Adds defaultRaw precedence logic to the base compileColumn method, covering MySQL and any other dialect that inherits without overriding.
src/Query/Schema/Forwarder/ClickHouse.php Adds array/tuple/orderByRaw forwarding methods to the ClickHouse column forwarder trait.

Reviews (4): Last reviewed commit: "fix(tests): expect UnsupportedException ..." | Re-trigger Greptile

Comment thread src/Query/Schema/ClickHouse.php
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@abnegate
Copy link
Copy Markdown
Member

@copilot Fix the unit test failure introduced by the last commit

Comment thread tests/Query/Schema/ClickHouseTest.php Outdated
Comment thread src/Query/Schema/ClickHouse.php
abnegate and others added 2 commits May 19, 2026 22:30
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
ClickHouse does not support Nullable(Array(...)) — the schema compiler
already throws an UnsupportedException with a message directing callers
to use an empty array as the missing-value sentinel. The previous test
asserted compiled SQL that the compiler refuses to emit. Mirror the
sibling testArrayRejectsLowCardinalityWrap pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lohanidamodar lohanidamodar merged commit 3dff874 into main May 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants