feat: add findGrouped() to Adapter + ClickHouse implementation#119
feat: add findGrouped() to Adapter + ClickHouse implementation#119lohanidamodar wants to merge 1 commit into
Conversation
Add a new method `findGrouped(array $queries, string $groupBy, string
$interval): array` to the audit Adapter abstraction and implement it in
the ClickHouse adapter. It produces time-bucketed grouped counts for
charts of the form "events per <interval>, grouped by <attribute>".
The method validates `$groupBy` against {event, userType, resourceType}
and `$interval` against {hour, day, week, month}, then reuses the
existing `parseQueries()` filter/param pipeline to apply both the query
filters and `getTenantFilter()` for tenant-safe scoping. `Query::limit()`
/ `Query::offset()` are applied to the group set (default 25, clamped
to 100), with top-N selection by `SUM(count)` desc and ties broken by
groupValue asc.
A single SQL round-trip drives the aggregation via CTEs: top-N groups,
time bounds, a zero-filled bucket axis built with `arrayJoin(range(...))`
+ `date_add(<unit>, i, bucket_from)`, and the per-bucket aggregate. A
final `CROSS JOIN` + `LEFT JOIN` produces the dense (groupValue, bucket,
count) result set, with buckets aligned via
`toStartOfInterval(time, INTERVAL 1 <unit>, 'UTC')` and formatted as
ISO-8601 UTC strings to match how `find()` returns `time`.
The Database adapter throws explicitly — grouped time-bucketed
aggregations are an analytical-store feature.
Schema: a `_key_resource_type` bloom-filter index has been added to the
ClickHouse `getIndexes()` definition so new installs get it. Existing
installs need a one-off migration to add it; left out of this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds
Confidence Score: 4/5The change is safe to merge; the only findings are minor query inefficiencies with no impact on correctness or security. The SQL logic is correct, all user-supplied inputs are whitelist-validated before interpolation, and tenant isolation matches the existing methods. The two flagged issues are query optimisation opportunities that do not affect result correctness. The four-CTE query in src/Audit/Adapter/ClickHouse.php is worth a second pass on the agg CTE filtering; all other files are straightforward. Important Files Changed
Reviews (1): Last reviewed commit: "feat: add findGrouped() to Adapter + Cli..." | Re-trigger Greptile |
| FROM top_groups g | ||
| CROSS JOIN buckets b | ||
| LEFT JOIN agg a ON a.value = g.value AND a.bucket = b.bucket | ||
| WHERE g.value IN (SELECT value FROM top_groups) | ||
| ORDER BY value ASC, bucket ASC |
There was a problem hiding this comment.
The outer
WHERE g.value IN (SELECT value FROM top_groups) predicate is redundant — g is already derived directly from top_groups, so every row of g is guaranteed to satisfy this condition. The clause adds an extra correlated subquery that ClickHouse may or may not optimise away, and it obscures the intent of the JOIN.
| FROM top_groups g | |
| CROSS JOIN buckets b | |
| LEFT JOIN agg a ON a.value = g.value AND a.bucket = b.bucket | |
| WHERE g.value IN (SELECT value FROM top_groups) | |
| ORDER BY value ASC, bucket ASC | |
| FROM top_groups g | |
| CROSS JOIN buckets b | |
| LEFT JOIN agg a ON a.value = g.value AND a.bucket = b.bucket | |
| ORDER BY value ASC, bucket ASC |
| agg AS ( | ||
| SELECT | ||
| {$escapedGroup} AS value, | ||
| toStartOfInterval({$escapedTime}, INTERVAL 1 {$unit}, 'UTC') AS bucket, | ||
| COUNT(*) AS count | ||
| FROM {$escapedTable}{$whereClause} | ||
| GROUP BY value, bucket | ||
| ) |
There was a problem hiding this comment.
The
agg CTE aggregates every distinct group value that matches $whereClause, not just the top_groups. For high-cardinality columns like resourceType this can mean scanning and grouping many more rows than needed. Adding an IN-subquery filter restricts the aggregation to only the top-N groups already selected, at the cost of one extra read of the small top_groups result.
| agg AS ( | |
| SELECT | |
| {$escapedGroup} AS value, | |
| toStartOfInterval({$escapedTime}, INTERVAL 1 {$unit}, 'UTC') AS bucket, | |
| COUNT(*) AS count | |
| FROM {$escapedTable}{$whereClause} | |
| GROUP BY value, bucket | |
| ) | |
| agg AS ( | |
| SELECT | |
| {$escapedGroup} AS value, | |
| toStartOfInterval({$escapedTime}, INTERVAL 1 {$unit}, 'UTC') AS bucket, | |
| COUNT(*) AS count | |
| FROM {$escapedTable}{$whereClause} | |
| WHERE {$escapedGroup} IN (SELECT value FROM top_groups) | |
| GROUP BY value, bucket | |
| ) |
Summary
Adds a new
findGrouped(array $queries, string $groupBy, string $interval): arraymethod to the auditAdapterabstraction and implements it in the ClickHouse adapter. It produces time-bucketed grouped counts for charts of the form events per<interval>, grouped by<attribute>.Public API
The
Auditfacade gains a matching passthrough.Behavior
$groupByand$intervalare validated; invalid values throw.parseQueries()pipeline so all existingQuery::*filters work unchanged.getTenantFilter()is honored exactly likefind()/count()so shared-table installs stay tenant-safe.Query::limit()/Query::offset()are applied to the groups list. Default limit25, hard max100(over-limit values are clamped). Top-N is selected bySUM(count)desc, ties by groupValue asc.toStartOfInterval(time, INTERVAL 1 <unit>, 'UTC')and returned as ISO-8601 UTC strings (matching howfind()formatstime).value ASC, bucket ASCfor predictable consumption.SQL approach
Single round-trip using four CTEs:
top_groups— top-N groups bySUM(count)desc.bounds—MIN(time)/MAX(time)aligned to bucket starts.buckets—arrayJoin(arrayMap(i -> date_add(<unit>, i, bucket_from), range(0, ...)))produces the dense bucket axis.agg—(value, bucket) -> COUNT(*)aggregate.Then
CROSS JOIN buckets+LEFT JOIN aggyields the dense(value, bucket, count)grid. All identifiers go throughescapeIdentifier; all values go through ClickHouse bound parameters.Database adapter
The
Databaseadapter throws explicitly — grouped time-bucketed aggregations are designed for an analytical backend (ClickHouse).Schema note
Added a
_key_resource_typebloom-filter index to the ClickHouse adapter'sgetIndexes()so new installs cover theresourceTypecolumn natively. Existing installs need a one-off migration to add this index; that migration is intentionally not in this PR (separate concern).Test plan
composer lint(pint --test) — passcomposer check(phpstan analyse --level max) — passgroupBy, invalidinterval, grouped counts byevent, ordering invariant (value asc, bucket asc), zero-fill, limit clamping, offset paging, alternategroupBy=userType, empty time range, Database adapter "not supported"