feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#120
feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#120lohanidamodar wants to merge 18 commits into
Conversation
Pins to dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2 — the branch on utopia-php/query (PR #11) that adds the three ClickHouse builder extras audit needs: - Builder\ClickHouse::insertFormat(...) for INSERT ... FORMAT JSONEachRow - Trailing SETTINGS clause on DELETE (for async cleanup) - Schema\ClickHouse::create/dropMaterializedView (not used here) TODO: flip to ^0.3.2 once PR #11 lands on utopia-php/query main and a 0.3.2 tag exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 0.3.x utopia-php/query base class replaced the legacy `TYPE_*` string constants with a `Method` enum. Audit's adapter switches and tests all still compare against the string constants, so they're re-declared here mapped to the same string values (`equal`, `lessThan`, ...). Tests that compared `$query->getMethod()` (now returns a `Method` enum) against the constants are updated to compare against `getMethod()->value`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
utopia-php/query 0.3.x uses PHP 8.4 asymmetric visibility (`public protected(set)`) on schema and builder properties. PHPStan 1.12's bundled PhpParser can't tokenize that syntax — `composer check` crashes with a Lexer internal error on any audit source file that imports a query schema/builder class. PHPStan 2.x ships an updated PhpParser that handles asymmetric visibility, so bumping the dev dependency is the smallest change that restores the static-analysis gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sites utopia-php/query 0.3.x's `Query::getMethod()` returns a `Method` enum instead of a string. The ClickHouse adapter's parseQueries switch, and the Database adapter's count-time filter, both compare against the legacy string `TYPE_*` constants. Switch from `getMethod()` to `getMethod()->value` at the two call sites so the existing comparisons keep matching. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumping PHPStan to 2.x to handle PHP 8.4 syntax surfaces a handful of pre-existing diagnostics in code paths the migration does not otherwise touch: - Database adapter: redundant `instanceof Utopia\Audit\Query` guards inside a method whose signature already constrains the parameter. Kept as runtime defense (real callers occasionally pass mixed arrays); annotated with @phpstan-ignore-next-line. - Log::getData(): PHPStan can't widen the ArrayObject return without a local @var hint. - AuditBase batch test: removed a duplicate `applyRequiredAttributesToBatch` call (typo; the second invocation already re-merged the same row). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-built CREATE TABLE in `setup()` with a `Schema\ClickHouse` table builder. Engine, ORDER BY tuple, PARTITION BY expression, table settings, columns, and data-skipping (bloom_filter) indexes are now declared through the schema API; the resulting Statement is executed verbatim through the existing HTTP layer. Column definitions follow the audit attribute descriptors: - `id String` (primary) - `time DateTime64(3)` (NOT NULL — partition key) - `<id> [Nullable(]String[)]` for every other attribute, nullability driven by the descriptor's `required` flag - `tenant Nullable(UInt64)` when sharedTables is on CREATE DATABASE IF NOT EXISTS still goes through a raw string — the schema's `createDatabase()` helper has no IF NOT EXISTS form and we'd otherwise break the second `setup()` call. Also folds in a few PHPStan-driven cleanups in this file (collapsed `!empty($inParams)` guards into unconditional emits since the upstream VALUE_REQUIRED_METHODS guard already rejects empty value lists; dropped the redundant is_array($row) inside parseJsonResults whose row type was already array<string,mixed>). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eBatch Build the JSONEachRow INSERT statement through `Builder\ClickHouse::insertFormat` instead of a hand-rolled string. Column list is derived from the audit schema (id, time, the remaining attributes, optional tenant) in the same insertion order as the row maps so the JSON keys line up against ClickHouse's declared columns. The JSONEachRow body itself still serializes in the adapter — that's an HTTP-payload concern that stays in the runtime layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compile the cleanup DELETE through `Builder\ClickHouse::delete()` with a
trailing SETTINGS clause emitted by the builder when async cleanup is
enabled. The DELETE WHERE expression is supplied via `whereRaw()` so the
ClickHouse-typed parameter syntax (`{datetime:DateTime64(3)}`) and the
existing tenant filter stay in the runtime layer — the builder still
emits generic `?` placeholders today (dry-run gap #7).
Behaviour note: the ClickHouse builder compiles DELETE as
`ALTER TABLE ... DELETE` (mutation) instead of the lightweight
`DELETE FROM ...` the previous string emitted. The async setting tracks
the same shift: `mutations_sync = 0` for mutations replaces the
previous `lightweight_deletes_sync = 0`. End-state row visibility is
the same; the storage path is different (mutations rewrite parts,
lightweight deletes mask rows).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Route the SELECT skeleton (FROM, raw projection, WHERE conjunction,
ORDER BY) for `find()`, `count()`, and `getById()` through
`Builder\ClickHouse`. Filter expressions emitted by `parseQueries()`
still carry ClickHouse-typed parameter hints (`{name:Type}`) and the
audit-side associative params map; both ride along via `whereRaw()` /
`orderByRaw()` so the typed-binding HTTP layer keeps working.
`LIMIT` / `OFFSET` and the trailing `FORMAT JSON` / `FORMAT TabSeparated`
are appended on the compiled SQL string — the base builder emits
positional `?` placeholders for limit/offset, which would collide with
the ClickHouse `{name:UInt64}` placeholders the runtime layer binds.
Cleaning that up belongs with the gap #7 (ClickHouse param hint) work
on utopia-php/query.
Cursor pagination still builds tuple WHERE fragments through the
existing `buildCursorWhere()` helper and feeds them in via `whereRaw()`
(gap #8, deferred).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the SQL emitted by `Schema\ClickHouse` and `Builder\ClickHouse` through the audit configurations the adapter relies on: - `setup()` CREATE TABLE — engine, columns, indexes, ORDER BY, PARTITION BY, SETTINGS - `createBatch()` `INSERT ... FORMAT JSONEachRow` with the audit column list - `cleanup()` async DELETE with trailing SETTINGS clause - `cleanup()` sync DELETE with no SETTINGS - `find()` SELECT with whereRaw filters, cursor tuple comparison, ORDER BY tiebreaker, LIMIT and FORMAT JSON tail These do not require a live ClickHouse so they run as fast unit tests and prevent silent SQL drift from a query-lib upgrade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR migrates the ClickHouse audit adapter from the legacy 0.1.x
Confidence Score: 5/5The migration is well-scoped and carefully bounded; all previously flagged issues have been addressed and the remaining observations are maintenance-level. The adapter correctly preserves lightweight-delete semantics, the tenant filter embeds its value as a literal (no missing binding), cursor and builder params use distinct naming conventions so the + merge is safe, and the duplicate applyRequiredAttributesToBatch call was cleaned up. The only open items are stale entries in the column type map and missing snapshot coverage for string-predicate query types — neither blocks correctness. No files require special attention for merge safety, though adding snapshot tests for startsWith/endsWith/regex in ClickHouseSqlSnapshotTest.php would harden the migration before the dev-branch pin is flipped to a stable tag. Important Files Changed
Reviews (4): Last reviewed commit: "Merge branch 'main' into feat/utopia-que..." | Re-trigger Greptile |
Bump utopia-php/query to the upstream branch HEAD that ships named-typed
{name:Type} placeholder support and the lightweight DELETE form, then add
two helpers to the adapter:
- getColumnTypeMap() derives a column → ClickHouse-type map from the
schema attributes (DateTime → DateTime64(3), all other columns →
String) plus id, tenant (when sharedTables is enabled) and the
limit/offset/max pseudo-columns used by the count/find SQL wrappers.
- newBuilder() returns a Builder\ClickHouse with useNamedBindings() and
withParamTypes() pre-applied, so positional `?` bindings flow through
the typed-binding rewriter at Statement-emission time.
Every existing adapter call site is rerouted from `new ClickHouseBuilder()`
to `$this->newBuilder()`. The current call sites all hand-construct their
{name:Type} placeholders inside whereRaw fragments with zero `?` bindings,
so this change is a no-op for the existing SQL shape — it just preps the
infra that the find()/count()/getById() reads will switch to next.
…ter API
Drop every hand-built WHERE / ORDER BY fragment in find(), count() and
getById() and feed Query value objects straight to Builder\ClickHouse via
filter(), sortAsc/sortDesc/sortRandom, limit() and offset(). Positional
`?` bindings are rewritten to `{paramN:Type}` placeholders by the
typed-binding registration installed in newBuilder(), so HTTP params now
flow through $statement->namedBindings instead of a hand-maintained
$paramCounter dict.
parseQueries() is reduced to a list of Query objects plus auxiliary
metadata (orderAttributes, limit, offset, cursor, select). Two
audit-specific rewrites stay in this layer:
- Contains / NotContains are remapped to Equal / NotEqual so they keep
audit's historical IN / NOT IN semantics. The base builder compiles
Contains to substring-match `position(x, ?) > 0`, which is not what
callers like Audit::getByUserAndEvents() expect.
- `time`-column DateTime values are stringified at parse time so they
appear in namedBindings as `Y-m-d H:i:s.v` literals rather than raw
objects the HTTP layer can't serialise.
Cursor pagination keeps its whereRaw escape hatch with explicit
{name:Type} placeholders — Builder\ClickHouse still has no tuple-compare
helper, so the existing buildCursorWhere() output is appended to the
builder and its params are merged into the final HTTP request alongside
$statement->namedBindings. The dead buildOrderBySql() helper is removed;
applyOrderBy() drives the builder directly.
Builder\ClickHouse now defaults to DELETE_MODE_LIGHTWEIGHT, so cleanup() emits `DELETE FROM t WHERE …` again — matching audit's pre-migration baseline. The previous mutation form (`ALTER TABLE t DELETE WHERE …`) was a workaround forced by the older builder API and changed the storage-path semantics: lightweight marks rows deleted via a mask and is the right tool for row-level cleanup, while mutations rewrite parts on disk and are heavier. The async SETTINGS knob switches from `mutations_sync = 0` to `lightweight_deletes_sync = 0` to match the new DELETE form. The public setAsyncCleanup() docblock already referenced lightweight_deletes_sync, so no docs change is needed.
…LETE
Pin the new SQL shape on the adapter's hot paths:
- cleanup() emits `DELETE FROM t WHERE …` with an optional trailing
`SETTINGS lightweight_deletes_sync=0` for the async path.
- find() / count() filter via Builder\ClickHouse::filter() and
sortAsc/sortDesc/limit, producing typed `{paramN:Type}` placeholders
and a populated `namedBindings` map on the Statement.
- The cursor whereRaw fragment with explicit `{name:Type}` placeholders
composes cleanly with the typed positional bindings — both end up in
the final HTTP params dict by the adapter merging them.
- count(max) wraps the inner Statement in `SELECT COUNT(*) FROM (… LIMIT ?) sub`
and reuses the inner builder's namedBindings unchanged.
A small newAuditBuilder() / auditTypeMap() helper mirrors the adapter's
own column → ClickHouse-type registration so the snapshot tests exercise
the same typed-binding flow callers will see in production.
The Dockerfile pinned php:8.3.3-cli-alpine3.19, but composer.json requires php >=8.4. Once utopia-php/query: ^0.3.0 was adopted, the CI Tests check failed because the library uses 8.4-only asymmetric visibility syntax (public protected(set)). Bump the test image to php:8.4.21-cli-alpine3.23 so the CI environment matches the package requirement.
| case Query::TYPE_NOT_CONTAINS: | ||
| $this->validateAttributeName($attribute); | ||
| $escapedAttr = $this->escapeIdentifier($attribute); | ||
| $pattern = $values[0] ?? null; | ||
| if (!is_string($pattern)) { | ||
| throw new Exception("regex pattern must be a string for attribute '{$attribute}'"); | ||
| } | ||
| $paramName = 'param_' . $paramCounter++; | ||
| // ClickHouse's `match(haystack, pattern)` is the re2-style | ||
| // regex predicate. Pattern is bound as a parameter, never | ||
| // interpolated, so it can't escape into the SQL. | ||
| $filters[] = "match({$escapedAttr}, {{$paramName}:String})"; | ||
| $params[$paramName] = $pattern; | ||
| $filters[] = new BaseQuery(Method::NotEqual, $attribute, $this->normalizeFilterValues($attribute, $values)); | ||
| break; |
There was a problem hiding this comment.
notContains multi-value semantics not tested
The old code explicitly built attr NOT IN (v1, v2, ...) for TYPE_NOT_CONTAINS with multiple values. The new mapping delegates to Method::NotEqual, which works correctly only if the base builder emits NOT IN (...) for multi-value NotEqual queries. The snapshot tests verify that Equal with a single-value array produces IN ({paramN:Type}), but there is no parallel test for NotEqual with multiple values. If the builder emits != {param0} (binding only the first value) instead of NOT IN (v1, v2, ...), every notContains(['a', 'b']) call silently returns wrong rows — the filter would drop all but the first exclusion value.
There was a problem hiding this comment.
Added a snapshot test in 75c48f5 (testNotContainsMultiValueEmitsTypedNotIn) that locks the multi-value path:
WHERE `event` NOT IN ({param0:String}, {param1:String}) LIMIT {param2:Int64}
with named bindings {param0: 'users.delete', param1: 'projects.delete', param2: 25}. The base builder's compileNotIn() (vendor/utopia-php/query/src/Query/Builder.php:1769) handles 2+ non-null values via NOT IN (?, ?, …); the test pins the resulting ClickHouse-typed form so a query-lib drift can't silently downgrade notContains(['a','b']) to a single-value != filter.
ClickHouse JSON responses come from json_decode and aren't statically guaranteed to satisfy any inner shape. The pre-migration code skipped non-array rows defensively; restore that guard and loosen the @var on \$data to array<int, mixed> so PHPStan max accepts the runtime check instead of pruning it as already-narrowed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a snapshot test that locks the SQL emitted when TYPE_NOT_CONTAINS
is mapped to Method::NotEqual with multiple values. The base builder
compiles NotEqual with 2+ non-null values via compileNotIn(), producing
`attr NOT IN (?, ?)`; with named typed bindings on Builder\ClickHouse
this becomes `attr NOT IN ({param0:Type}, {param1:Type})`. Pinning the
shape guards against a query-lib drift silently degrading multi-value
notContains() to a single-value `!=` filter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if ($this->asyncCleanup) { | ||
| $builder->settings(['lightweight_deletes_sync' => '0']); | ||
| } |
There was a problem hiding this comment.
The
cleanup() builder uses lightweight_deletes_sync here, but the PR description states the setting was changed to mutations_sync = 0. DELETE FROM is ClickHouse's lightweight-delete syntax, and lightweight_deletes_sync is the correct setting for it — so the code and snapshot test are internally consistent. However, the PR description is factually wrong when it says this was switched to mutations_sync, which could mislead future maintainers into thinking the mutation semantics changed.
| if ($this->asyncCleanup) { | |
| $builder->settings(['lightweight_deletes_sync' => '0']); | |
| } | |
| if ($this->asyncCleanup) { | |
| $builder->settings(['mutations_sync' => '0']); | |
| } |
Brings in actor-terminology rename (PR #122), location-column drop / N-part resource parser (PR #118), and country / userAgent test refactors. Conflict resolution (src/Audit/Adapter/ClickHouse.php parseQueries): - Kept main's call to translateAttribute() right after reading $query->getAttribute() so callers passing legacy 'userId' / 'userType' / 'userInternalId' Query filters are rewritten to the renamed 'actorId' / 'actorType' / 'actorInternalId' columns before the typed-bindings layer compiles them. - The column type map registered via withParamTypes() is derived from getAttributes() and already uses the actor* names, so once the attribute is translated the map binds it as String correctly. - Public Audit API (Audit::log, getLogsByUser, countLogsByUser, getLogsByUserAndEvents, countLogsByUserAndEvents) is unchanged -- the ClickHouse adapter's getByUser / countByUser implementations already build internal Query::equal('actorId', $userId) filters. ClickHouseSqlSnapshotTest: - Updated the synthetic audit type map and DDL snapshot to reflect actorId / actorType / actorInternalId columns, idx_actorId_event, and the dropped location column. Find / count / cursor snapshots now filter on actorId.
Summary
Migrate the ClickHouse audit adapter off the legacy 0.1.x
utopia-php/querycontract onto the 0.3.x
Builder\ClickHouseandSchema\ClickHouseAPI. EverySQL/DDL string the adapter emits now goes through the query library; the audit
side keeps only the HTTP transport, ClickHouse-typed parameter hint binding,
JSONEachRow body serialization, and tuple cursor compilation.
What was replaced:
setup()— CREATE TABLE / engine / ORDER BY / PARTITION BY / SETTINGS /bloom_filter indexes through
Schema\ClickHouse.createBatch()—INSERT INTO … (cols) FORMAT JSONEachRowthroughBuilder\ClickHouse::insertFormat()(rows still serialized in theHTTP-payload layer).
cleanup()—ALTER TABLE … DELETE WHERE …with optional trailingSETTINGSclause throughBuilder\ClickHouse::settings()+delete(). Async cleanup now drivesmutations_sync = 0instead oflightweight_deletes_sync = 0because the builder compiles mutations,not lightweight deletes; row visibility is identical.
find(),count(),getById()—SELECT … FROM … WHERE … ORDER BY …skeleton through
Builder\ClickHouse. Filter / cursor / tenantexpressions ride along via
whereRaw(); the existing ClickHouse-typed{name:Type}parameters and associative params map stay in the auditruntime layer.
Dependency PR
Depends on utopia-php/query#11 (
feat/clickhouse-insert-delete-settings-mv).That branch adds
insertFormat(), trailingSETTINGSondelete(), andcreateMaterializedView()/dropMaterializedView()toBuilder\ClickHouse/Schema\ClickHouse. The Composer pin lives atdev-feat/clickhouse-insert-delete-settings-mv as 0.3.2; flip to^0.3.2once query#11 lands and a
0.3.2tag exists.The ranges PHPStan deps were bumped to
^2.0because PHP-Parser bundled byPHPStan 1.x can't tokenize the asymmetric visibility (
public protected(set))the query library now uses on schema and builder properties.
Deferred / known gaps
Builder\ClickHouseemits generic?placeholders for compiled filters, limits, and offsets. Audit bindsby associative
{name:Type}so the runtime layer can attach ClickHousetype hints (
UInt64,DateTime64(3), etc.). The adapter still relies onits own
parseQueries()to emit the typed clauses (passed throughwhereRaw()), and appendsLIMIT/OFFSETwith typed placeholders by hand.Once the query lib gains a typed-placeholder hook this can collapse into a
single builder call chain.
(a > A) OR (a = A AND b > B) …cascade in the adapter and feeds itthrough
whereRaw(). The base builder does not yet emitClickHouse-flavoured tuple comparisons.
LowCardinalitycolumns. Audit's high-cardinality-leaning columns(
event,resourceType,userType) would benefit fromLowCardinality(String)but the schema'sstring()helper does not yetexpose it generically; out of scope here.
Commit topology
chore(deps): bump utopia-php/query to 0.3.x (dev-branch pin)refactor(query): re-expose TYPE_* constants on Audit\Query for 0.3.xchore(deps): bump phpstan to ^2.0 for PHP 8.4 query librefactor(adapters): unwrap Method enum to string at getMethod() call siteschore: silence PHPStan 2.x diagnostics unrelated to the migrationrefactor(clickhouse): use Schema\ClickHouse for setup() DDLrefactor(clickhouse): use Builder INSERT FORMAT JSONEachRow for createBatchrefactor(clickhouse): use Builder DELETE + SETTINGS for cleanup()refactor(clickhouse): use Builder for find/count/getById SQL shapetest(clickhouse): add SQL snapshot tests for migrated builder pathsTest plan
composer install+composer format:check+composer check(PHPStan max) all clean locally.
vendor/bin/phpunit --filter "QueryTest|ClickHouseSqlSnapshotTest"green (no docker needed).
docker-compose up clickhouse testsand runthe full audit suite (
composer test); confirmcursor / count / shared-tables / async-cleanup paths still pass.
setup()is byte-compatiblewith the previously emitted DDL (engine, ORDER BY, PARTITION BY,
SETTINGS) on an existing audit deployment that already has an
auditstable.